On 03/06/2012 07:29 PM, Chris Bowditch wrote:
On 06/03/2012 11:08, mehdi houshmand wrote:
At least that is transparent to the user, but re-parsing the input is
a sub-optimal solution as it incurs a performance penalty so we should
investigate if there are alternatives first. I can't recall why the
Postscript Paintewr/Renderer was architected in that way but thats a
Font de-duping is intrinsically a post-process action, you need the
full document, with all fonts, before you can do any font de-duping.
PostScript does this very thing (to a much lesser extent) with the
<optimize-resources> tag, as a post-process action.
At a guess, because PostScript is much less capable of non-linear
references and access than PDF is. It's more expensive and slower to
forward-reference resources because PostScript has to parse and execute
all the rest of the document to find the resource it wants, while PDF
just seeks to the object at the byte offset referenced in the xref table
and reads only the object it requires.
Assuming the font name is unique is dangerous, since it's provably true
that in the wild there are numerous subtly (and sometimes grossly)
different fonts with the same name.
The requirements are perfectly clear: Given a set of input PDFs,
XSL-FO, create a single merged PDF with a consistent and unduplicated
set of fonts. Why would there be slight kerning differences if the
assumption that the font name is unique holds true.
The font dictionary contains glyph metrics information that along with
the font name, slant, weight etc can be used to match the font rather
more closely. For extra caution, checksums of subset glyphs can be done
to make sure they're *identical*, but honestly that's unnecessary if the
If that assumption is wrong then I agree with what you say. Ultimately
that should be down to the user though, they know their fonts, so they
can decide whether to merge them or not via a setting in the
fop.xconf. Your argument is not sufficient to say this approach should
never be used. It brings a lot of benefit to users who know their font
names are unique.
It should be safe to do automatically and transparently by default,
because only partially overlapping subsets of identical fonts should
ever be merged. Anything else is a substitution not merging duplicate
subsets, and has entirely different considerations because of the
possibility of visible changes caused by non-matching metrics etc.