Re: Fwd: Google Summer of Code

Craig Ringer Tue, 06 Mar 2012 06:57:00 -0800

On 03/06/2012 07:29 PM, Chris Bowditch wrote:

On 06/03/2012 11:08, mehdi houshmand wrote:
Hi Mehdi,
Font de-duping is intrinsically a post-process action, you need the
full document, with all fonts, before you can do any font de-duping.
PostScript does this very thing (to a much lesser extent) with the
<optimize-resources>  tag, as a post-process action.
At least that is transparent to the user, but re-parsing the input isa sub-optimal solution as it incurs a performance penalty so we shouldinvestigate if there are alternatives first. I can't recall why thePostscript Paintewr/Renderer was architected in that way but thats aseparate topic.

At a guess, because PostScript is much less capable of non-linearreferences and access than PDF is. It's more expensive and slower toforward-reference resources because PostScript has to parse and executeall the rest of the document to find the resource it wants, while PDFjust seeks to the object at the byte offset referenced in the xref tableand reads only the object it requires.

The requirements are perfectly clear: Given a set of input PDFs,XSL-FO, create a single merged PDF with a consistent and unduplicatedset of fonts. Why would there be slight kerning differences if theassumption that the font name is unique holds true.

Assuming the font name is unique is dangerous, since it's provably truethat in the wild there are numerous subtly (and sometimes grossly)different fonts with the same name.

The font dictionary contains glyph metrics information that along withthe font name, slant, weight etc can be used to match the font rathermore closely. For extra caution, checksums of subset glyphs can be doneto make sure they're *identical*, but honestly that's unnecessary if themetrics match.

If that assumption is wrong then I agree with what you say. Ultimatelythat should be down to the user though, they know their fonts, so theycan decide whether to merge them or not via a setting in thefop.xconf. Your argument is not sufficient to say this approach shouldnever be used. It brings a lot of benefit to users who know their fontnames are unique.

It should be safe to do automatically and transparently by default,because only partially overlapping subsets of identical fonts shouldever be merged. Anything else is a substitution not merging duplicatesubsets, and has entirely different considerations because of thepossibility of visible changes caused by non-matching metrics etc.


--
Craig Ringer

Re: Fwd: Google Summer of Code

Reply via email to