- A clean way to associate data that's private to the image processing
plugin with a particular rendering run so I can access it across
multiple invocations of the plugin; and
For anyone else who needs this later: There doesn't appear to be any
especially nice way to do this with FOP's current image handler API, as
there's no general-purpose map on the user agent for image handlers to
stash their data in and nothing like that is passed as a param to the
image handler calls. The hints mechanism can pass data from a preloader
to a loader for the same image, but it can't be used to pass data
between image loaders.
What I've landed up doing is keying a WeakHashMap off the FOUserAgent
for the rendering run, as obtained via the RenderingContext passed to
ImageHandler.handleImage(...). So long as lookups and insertions on the
WeakHashMap are synchronized this is safe and will release the image
handler's per-render information when the FOUserAgent is discarded at
the end of the rendering run.
I'm now able to accumulate font usage information from the PDFs I
examine as I embed them and build a list of which fonts are used. I can
combine width arrays and first/last char listings to determine which
glyphs are required if the font is to be embedded as a subset.
- How to append some additional PDF objects after the last page is
emitted but before the PDF document trailer and final xref table(s)
are written out.
For anyone else looking at this now or later:
It's possible to allocate a PDFObject and request that it be written out
at the end of the document. PDFDocument.outputTrailer(...) writes
objects added to the trailer list. Those objects were allocated via the
factory where they were given an object ID, but were then passed to
addTrailerObject(...) to request that they be written out at the end of
document production. If I ever start producing my own combined font
subsets from the original subset fonts in the input PDFs, this is
probably how I'd insert the combined font subset object.
If I'm restricting font combining to fonts where fop has an original
font file and using fop's font subsystem the above would require too
much duplication and make it hard to avoid embedding fonts twice (once
for form xobjects, once for main content). Instead I need to mark a font
as used in fop's FontInfo for the rendering run so fop writes it out,
and I need to obtain the font object's PDF object ID so I can write
forward references to it in the XObject forms' resource dictionaries.
The problem here is that fop doesn't assign fonts an object ID until
very late in writing. The first reference to font objects is from the
resource dictionary, and fop only writes one of those - it is shared
between all pages and written out just before the trailer. Since fonts
are written out with the resources dictionary and don't usually need
object IDs until the resources dictionary has to reference them there's
no way to get their object IDs earlier in PDF production. This changes
when we need to write private resource dictionaries for embedded form
xobjects.
I'm looking at forcing early embedding of fonts with direct
makeFont(...) calls. This'll work so long as I'm happy embedding whole
fonts, but will prevent fop from subsetting the font for its own use and
prevent me from subsetting it for xobject forms.
Alternately, I could defer the writing of the xobject form resource
dictionaries till the end of the document so I didn't need to know the
font object IDs early - but I'd still need a way to write them *after*
the main fop resource dictionary. If I wanted to subset then I'd also
need a hook for just before fonts were written out by fop to adjust the
glyph width tables. I don't see any way around this without some kind of
PDF renderer listener for image handlers etc to use.
I'll try to put together a proof of concept that embeds whole fonts if
the font is found in a pdf form xobject, de-duplicating references so
all pdf form xobjects that use that font reference the same one. Fop
will use the same font since it knows about it and has stored it in the
used fonts map, so the only problem is that the whole font is embedded
rather than a subset.
Anyone working on the same thing, please feel free to drop me a note.
--
Craig Ringer