Hi Craig, My sincerest apologies for not getting round to looking at what you've done here. I'll try and take a look in the next few days and give it a think, see if there's anything we can do to help.
Apologies once again, Mehdi On 28 March 2012 07:39, Craig Ringer <[email protected]> wrote: > Hi > > I've nearly finished work on getting fop-pdf-image to overlay PDFs by > appending their content streams and merging their resource dictionaries, > rather than by creating XObject Forms. The problem I have left will be > more intrusive into the fop codebase than what I've had to do so far, so > I thought I'd check in before I start working on it. > > The reason I'm adapting fop-pdf-images to support "merging" PDF images > into the main PDF content instead of using XObject Forms is that the use > of lots of PDF XObject Forms seems to cause RIPs and clients to perform > poorly or run out of memory. The way I propose to do it, fop-pdf-images > will use an XObject form if the preloader sees a pdf image re-used more > than a configurable number of times (one by default), and otherwise > merge it into the main pdf. > > Most of that is done, but there's a problem with ensuring unique > resource names. > > XObject Form resource dictionaries are their own namespace, so no > resource name (font, ExtGState, etc) in an XObject Form may conflict > with a name in the parent page's resource dictionary. If XObject Forms > are no longer used by fop-pdf-image, that namespace separation goes > away. I have to merge the "image" page(s)'s resource dictionaries into > the resource dictionary of the page they're being overlaid over. In the > case of fop, that's the global resource dictionary because fop doesn't > currently write per-page resource dictionaries. There's nothing wrong > with this beyond potentially making the resource dictionary a bit fat, > but it means I need a way to guarantee that a name will not conflict > with any other name assigned by fop. > > For GState dictionary objects that's easy; fop just uses "GS"+object > number as the name, so if I follow the same scheme when copying > resources I'm guaranteed to get a unique name since object numbers are > unique. > > Unfortunately, fop doesn't do anything so consistent for fonts or most > other resources, and that's made it nearly impossible for me to > guarantee that I can use a name without a later part of the XSL-FO > causing fop to create an object that tries to use the same name. Solving > this will require some changes to the way fop writes the PDF resources > dictionary. > > I propose that the PDFResources class should take responsibilty for > allocating resource names and ensuring they're consistent. Instead of > asking each resource what its name is, the PDFResources class should > *assign* it a name. Those names can be minimal and compact - eg "Fnn" > for fonts, "GSnn" for graphics states, etc. "nn" would be a counter > maintained by PDFResources. That's the convention followed by most other > PDF producing software and would make it simple and reliable to inject > objects not created by fop into the resources dictionary without risk of > conflicts. > > That'll be important if people want to be able to write extensions that > add new, custom PDF content; it's not just useful for fop-pdf-images. > > This API change would only affect extensions, services and clients that > work directly with org.apache.fop.pdf. and > org.apache.fop.render.pdf. classes, and only some of those. Clients > that use the main fop APIs would be completely unaffected, as would > clients that use the area tree / IR code, image loader code, or pretty > much anything except the guts of pdf handling. > > I'll post a proposed patch soon, along with patches for some other > changes that enable what I'm doing but may be useful for others. A patch > with the fop-pdf-images "merge" feature support will follow once I've > finished it enough that I can do test-runs. > > -- > Craig Ringer > >
