Re: Document and page callbacks for image handlers

Craig Ringer Wed, 21 Dec 2011 16:26:24 -0800

On 21/12/2011 5:07 PM, Chris Bowditch wrote:

FOP can't currently fully embed a font in PDF, so even if you had thesource font available the code changes required could be extensive.For us, this approach isn't an option because we don't have the sourcefont to register in fop.xconf and embed. Therefore I am interested inknowing what you've come up with in terms of merging subsets togetherto create 1 super subset. That in my view is the most difficultchallenge in this problem. Resolving the problems with the crossreferences and the point at which IDs are assigned should be solvablewith a little code refactoring. I'm sure one of the guys will speak upif that's not the case.

As yet I haven't begun to tackle the actual merging of Type 1 orTrueType subsets into a single font. I've done the accumulation andmerging of the widths arrays, but not the fonts themselves. I plan tomake new minimum subsets from local fonts if they're available, and willtry merging of actual embedded font files only if I can't get that towork or if I have time. I don't know font data structures well enough towant to try merging subset embedded font files if I can possibly avoid it.

I've just finished writing and testing the code to accumulateinformation on each font as its encountered in a source PDF and merge itinto a collection of font information keyed by(FontName,SubType,Encoding). I compare the metrics to ensure that thefonts are really compatible and if they are I merge the widths arraysand startchar/endchar to produce information. At the end of the run Ican now produce a font dictionary and font descriptor for the minimumsubset required to satisfy the requirements of each of the embeddeddocuments using that font.

I can report on font usage, glyph usage within each font, and potentialsize savings, but I don't yet have it actually replacing the fonts.That's what I'll be working on today. First I'll be trying to use fop'sfont embedding mechanism to do it, which will require adding somecallbacks to fop's pdf output to run code just before the resourcedictionary is written out so I can inform fop of the required glyphs.I'll be delaying the writing of all the xobject resource dictionariesuntil after the fop resource dictionary is written so I know the fopfont oids and can embed them in the xobject resource dictionaries. Withluck I'm hoping I'll be able to write the minimum subset but I haven'tlooked into fop's font embedding code in enough detail to be sureexactly what I can do or how, so I'll be going delving shortly.

If this approach works the next step will be to allocate font object IDsearly so I don't need to waste memory on delaying xobject resourcedictionary writes and so I can avoid writing keys for fonts fop its selfnever uses to fop's resource dictionary.

Yesterday I attempted to unembed base-14 fonts during import of PDFcontent, so I'd recognise fonts like Helvetica in type1 and replace themwith a font dictionary for a base14 font reference rather than the embeddictionary. Acrobat choked on the result for reasons I'm not entirelysure of as it looked OK structurally. I'm not sure quite what was wrong,but hope to have more luck with re-embedding rather than replacementwith a base-14 font.

On a side note, I also need to enhance the font info collection code soit keys on more of the font metrics. Currently the first font with agiven (FontName,SubType,Encoding) tuple is registered for that key, andif subsequent fonts with the same key but incompatible metrics areencountered they're copied over verbatim exactly as is currently thecase. Expanding the key to cover the font bbox, ascent and descent etcwill help solve that and won't be hard, I'm just leaving it until I havea proof of concept font re-embed working.


--
Craig Ringer

Re: Document and page callbacks for image handlers

Reply via email to