On 03/06/2012 07:29 PM, Chris Bowditch wrote:
On 06/03/2012 11:08, mehdi houshmand wrote:

Hi Mehdi,

Font de-duping is intrinsically a post-process action, you need the
full document, with all fonts, before you can do any font de-duping.
PostScript does this very thing (to a much lesser extent) with the
<optimize-resources>  tag, as a post-process action.
At least that is transparent to the user, but re-parsing the input is a sub-optimal solution as it incurs a performance penalty so we should investigate if there are alternatives first. I can't recall why the Postscript Paintewr/Renderer was architected in that way but thats a separate topic.

At a guess, because PostScript is much less capable of non-linear references and access than PDF is. It's more expensive and slower to forward-reference resources because PostScript has to parse and execute all the rest of the document to find the resource it wants, while PDF just seeks to the object at the byte offset referenced in the xref table and reads only the object it requires.


The requirements are perfectly clear: Given a set of input PDFs, XSL-FO, create a single merged PDF with a consistent and unduplicated set of fonts. Why would there be slight kerning differences if the assumption that the font name is unique holds true.
Assuming the font name is unique is dangerous, since it's provably true that in the wild there are numerous subtly (and sometimes grossly) different fonts with the same name.

The font dictionary contains glyph metrics information that along with the font name, slant, weight etc can be used to match the font rather more closely. For extra caution, checksums of subset glyphs can be done to make sure they're *identical*, but honestly that's unnecessary if the metrics match.
If that assumption is wrong then I agree with what you say. Ultimately that should be down to the user though, they know their fonts, so they can decide whether to merge them or not via a setting in the fop.xconf. Your argument is not sufficient to say this approach should never be used. It brings a lot of benefit to users who know their font names are unique.
It should be safe to do automatically and transparently by default, because only partially overlapping subsets of identical fonts should ever be merged. Anything else is a substitution not merging duplicate subsets, and has entirely different considerations because of the possibility of visible changes caused by non-matching metrics etc.

--
Craig Ringer

Reply via email to