Hi I've nearly finished work on getting fop-pdf-image to overlay PDFs by appending their content streams and merging their resource dictionaries, rather than by creating XObject Forms. The problem I have left will be more intrusive into the fop codebase than what I've had to do so far, so I thought I'd check in before I start working on it.
The reason I'm adapting fop-pdf-images to support "merging" PDF images into the main PDF content instead of using XObject Forms is that the use of lots of PDF XObject Forms seems to cause RIPs and clients to perform poorly or run out of memory. The way I propose to do it, fop-pdf-images will use an XObject form if the preloader sees a pdf image re-used more than a configurable number of times (one by default), and otherwise merge it into the main pdf. Most of that is done, but there's a problem with ensuring unique resource names. XObject Form resource dictionaries are their own namespace, so no resource name (font, ExtGState, etc) in an XObject Form may conflict with a name in the parent page's resource dictionary. If XObject Forms are no longer used by fop-pdf-image, that namespace separation goes away. I have to merge the "image" page(s)'s resource dictionaries into the resource dictionary of the page they're being overlaid over. In the case of fop, that's the global resource dictionary because fop doesn't currently write per-page resource dictionaries. There's nothing wrong with this beyond potentially making the resource dictionary a bit fat, but it means I need a way to guarantee that a name will not conflict with any other name assigned by fop. For GState dictionary objects that's easy; fop just uses "GS"+object number as the name, so if I follow the same scheme when copying resources I'm guaranteed to get a unique name since object numbers are unique. Unfortunately, fop doesn't do anything so consistent for fonts or most other resources, and that's made it nearly impossible for me to guarantee that I can use a name without a later part of the XSL-FO causing fop to create an object that tries to use the same name. Solving this will require some changes to the way fop writes the PDF resources dictionary. I propose that the PDFResources class should take responsibilty for allocating resource names and ensuring they're consistent. Instead of asking each resource what its name is, the PDFResources class should *assign* it a name. Those names can be minimal and compact - eg "Fnn" for fonts, "GSnn" for graphics states, etc. "nn" would be a counter maintained by PDFResources. That's the convention followed by most other PDF producing software and would make it simple and reliable to inject objects not created by fop into the resources dictionary without risk of conflicts. That'll be important if people want to be able to write extensions that add new, custom PDF content; it's not just useful for fop-pdf-images. This API change would only affect extensions, services and clients that work directly with org.apache.fop.pdf. and org.apache.fop.render.pdf. classes, and only some of those. Clients that use the main fop APIs would be completely unaffected, as would clients that use the area tree / IR code, image loader code, or pretty much anything except the guts of pdf handling. I'll post a proposed patch soon, along with patches for some other changes that enable what I'm doing but may be useful for others. A patch with the fop-pdf-images "merge" feature support will follow once I've finished it enough that I can do test-runs. -- Craig Ringer