Hi

I've nearly finished work on getting fop-pdf-image to overlay PDFs by
appending their content streams and merging their resource dictionaries,
rather than by creating XObject Forms. The problem I have left will be
more intrusive into the fop codebase than what I've had to do so far, so
I thought I'd check in before I start working on it.

The reason I'm adapting fop-pdf-images to support "merging" PDF images
into the main PDF content instead of using XObject Forms is that the use
of lots of PDF XObject Forms seems to cause RIPs and clients to perform
poorly or run out of memory. The way I propose to do it, fop-pdf-images
will use an XObject form if the preloader sees a pdf image re-used more
than a configurable number of times (one by default), and otherwise
merge it into the main pdf.

Most of that is done, but there's a problem with ensuring unique
resource names.

XObject Form resource dictionaries are their own namespace, so no
resource name (font, ExtGState, etc) in an XObject Form may conflict
with a name in the parent page's resource dictionary. If XObject Forms
are no longer used by fop-pdf-image, that namespace separation goes
away. I have to merge the "image" page(s)'s resource dictionaries into
the resource dictionary of the page they're being overlaid over. In the
case of fop, that's the global resource dictionary because fop doesn't
currently write per-page resource dictionaries. There's nothing wrong
with this beyond potentially making the resource dictionary a bit fat,
but it means I need a way to guarantee that a name will not conflict
with any other name assigned by fop.

For GState dictionary objects that's easy; fop just uses "GS"+object
number as the name, so if I follow the same scheme when copying
resources I'm guaranteed to get a unique name since object numbers are
unique.

Unfortunately, fop doesn't do anything so consistent for fonts or most
other resources, and that's made it nearly impossible for me to
guarantee that I can use a name without a later part of the XSL-FO
causing fop to create an object that tries to use the same name. Solving
this will require some changes to the way fop writes the PDF resources
dictionary.

I propose that the PDFResources class should take responsibilty for
allocating resource names and ensuring they're consistent. Instead of
asking each resource what its name is, the PDFResources class should
*assign* it a name. Those names can be minimal and compact - eg "Fnn"
for fonts, "GSnn" for graphics states, etc. "nn" would be a counter
maintained by PDFResources. That's the convention followed by most other
PDF producing software and would make it simple and reliable to inject
objects not created by fop into the resources dictionary without risk of
conflicts.

That'll be important if people want to be able to write extensions that
add new, custom PDF content; it's not just useful for fop-pdf-images.

This API change would only affect extensions, services and clients that
work directly with org.apache.fop.pdf.   and
org.apache.fop.render.pdf.   classes, and only some of those. Clients
that use the main fop APIs would be completely unaffected, as would
clients that use the area tree / IR code, image loader code, or pretty
much anything except the guts of pdf handling.

I'll post a proposed patch soon, along with patches for some other
changes that enable what I'm doing but may be useful for others. A patch
with the fop-pdf-images "merge" feature support will follow once I've
finished it enough that I can do test-runs.

--
Craig Ringer

Reply via email to