On 07/03/12 16:35, mehdi houshmand wrote: >> * Insert the concatenated content streams from the source PDF into the >> output content stream. They must be surrounded by appropriate graphics >> state save and restore operators and any necessary scale/position >> operations to place the content where you want it. > > HA HA!! Incorrect! If you look into the nooks and crannies of the PDF > spec, you'll see that it's possible to use content stream arrays for > the /Page content stream.
Sure - that's why I said the content stream(s) had to be concatenated before insertion, because the input might be an array of content streams. I was thinking that to get reliable results when overlaying you'd have to wrap the whole series of drawing operations from the input in state saving/restoring operations, etc, thus having to concatenate the streams before wrapping. In retrospect, that's not true; one can just as well wrap each copied content stream in state save/restore and scale/position operations. It might even be possible to get away without a graphics state save/restore, but I don't think so. IIRC multiple content streams are treated by the reader as if they were one concatenated stream, so you still have to save/restore gstate to ensure the inserted stream doesn't mess up anything after it. I'll have to check this in the PDF ref, though. > I'll leave exploring that to you, but > basically it makes overlaying pages much much simpler. In related > news, PDFBox does just that!! What we did (and it's super hack, but it > worked) is if there we pages with both PDF-image content and FOP > generated content, we'd get FOP to generate the content without the > PDF-image and just overlay the pages. Best of both worlds!! (Though > the purist in me is very much aggrieved) Urk, that's horrible! Effective, though, I expect. Presumably you still have to translate scale and rotate then clip the content stream you're overlaying, though. > [snip] > > The more you describe your problem, the more it sounds like you need > to do exactly what we did, but just to be sure, I thought I'd explain > how we got there. Assumptions are a dangerous thing and I've probably > made some about your issue too. Given what you've described I'm inclined to agree that the cause of the issues is the same. I suspect we're facing the same problem or very similar problems, in which case my RIP crash issues may not be font related after all. I still want to fix the font issues because, rip crash causing or not, the font subset duplication produces massively bloated PDFs that are totally unsuitable for online distribution. It's kind of disheartening to learn that the RIP crash issues are probably something else entirely, since I thought I at least had to solve only one problem. As for doing exactly what you did: I'd certainly be very interested in seeing your PDFBox code for loading the fop-generated PDF, finding the placeholders, and overlaying the PDF graphics over them. In particular I'd like to see how you handled scaling/translation/rotation/clipping when drawing the copied streams, and how you handled state saving and restoration. I can see overlaying over placeholders in post-processing as a really useful interim solution, though eventually I'd like to enhance fop-pdf-image to do that overlaying directly. The really frustrating thing is that sometimes using an XObject will be exactly the right thing to do, because the PDF being embedded actually appears multiple times in the document. The solution to this links neatly into the font de-duplication issue: fop image plugins need a way to store per-render-run information, in this case so they can determine how often an image occurs in a document during the preload run and make an appropriate decision about how to embed it. I'm not sure it's even necessary to have an image plugin api change for this; plugins should be able to store enough information in a WeakHashMap<FOUserAgent,...> to figure it out, so I should be able to make fop-image-plugin use form XObjects only for pdf images referenced multiple times. -- Craig Ringer
