Re: Document and page callbacks for image handlers

Craig Ringer Sun, 18 Dec 2011 23:21:33 -0800

- A clean way to associate data that's private to the image processingplugin with a particular rendering run so I can access it acrossmultiple invocations of the plugin; and

For anyone else who needs this later: There doesn't appear to be anyespecially nice way to do this with FOP's current image handler API, asthere's no general-purpose map on the user agent for image handlers tostash their data in and nothing like that is passed as a param to theimage handler calls. The hints mechanism can pass data from a preloaderto a loader for the same image, but it can't be used to pass databetween image loaders.

What I've landed up doing is keying a WeakHashMap off the FOUserAgentfor the rendering run, as obtained via the RenderingContext passed toImageHandler.handleImage(...). So long as lookups and insertions on theWeakHashMap are synchronized this is safe and will release the imagehandler's per-render information when the FOUserAgent is discarded atthe end of the rendering run.

I'm now able to accumulate font usage information from the PDFs Iexamine as I embed them and build a list of which fonts are used. I cancombine width arrays and first/last char listings to determine whichglyphs are required if the font is to be embedded as a subset.

- How to append some additional PDF objects after the last page isemitted but before the PDF document trailer and final xref table(s)are written out.


For anyone else looking at this now or later:

It's possible to allocate a PDFObject and request that it be written outat the end of the document. PDFDocument.outputTrailer(...) writesobjects added to the trailer list. Those objects were allocated via thefactory where they were given an object ID, but were then passed toaddTrailerObject(...) to request that they be written out at the end ofdocument production. If I ever start producing my own combined fontsubsets from the original subset fonts in the input PDFs, this isprobably how I'd insert the combined font subset object.

If I'm restricting font combining to fonts where fop has an originalfont file and using fop's font subsystem the above would require toomuch duplication and make it hard to avoid embedding fonts twice (oncefor form xobjects, once for main content). Instead I need to mark a fontas used in fop's FontInfo for the rendering run so fop writes it out,and I need to obtain the font object's PDF object ID so I can writeforward references to it in the XObject forms' resource dictionaries.

The problem here is that fop doesn't assign fonts an object ID untilvery late in writing. The first reference to font objects is from theresource dictionary, and fop only writes one of those - it is sharedbetween all pages and written out just before the trailer. Since fontsare written out with the resources dictionary and don't usually needobject IDs until the resources dictionary has to reference them there'sno way to get their object IDs earlier in PDF production. This changeswhen we need to write private resource dictionaries for embedded formxobjects.

I'm looking at forcing early embedding of fonts with directmakeFont(...) calls. This'll work so long as I'm happy embedding wholefonts, but will prevent fop from subsetting the font for its own use andprevent me from subsetting it for xobject forms.

Alternately, I could defer the writing of the xobject form resourcedictionaries till the end of the document so I didn't need to know thefont object IDs early - but I'd still need a way to write them *after*the main fop resource dictionary. If I wanted to subset then I'd alsoneed a hook for just before fonts were written out by fop to adjust theglyph width tables. I don't see any way around this without some kind ofPDF renderer listener for image handlers etc to use.

I'll try to put together a proof of concept that embeds whole fonts ifthe font is found in a pdf form xobject, de-duplicating references soall pdf form xobjects that use that font reference the same one. Fopwill use the same font since it knows about it and has stored it in theused fonts map, so the only problem is that the whole font is embeddedrather than a subset.


Anyone working on the same thing, please feel free to drop me a note.

--
Craig Ringer

Re: Document and page callbacks for image handlers

Reply via email to