Re: Fwd: fop-pdf-image and fonts; as requested

Craig Ringer Wed, 07 Mar 2012 00:59:48 -0800

On 07/03/12 16:35, mehdi houshmand wrote:
>> * Insert the concatenated content streams from the source PDF into the
>> output content stream. They must be surrounded by appropriate graphics
>> state save and restore operators and any necessary scale/position
>> operations to place the content where you want it.
> 
> HA HA!! Incorrect! If you look into the nooks and crannies of the PDF
> spec, you'll see that it's possible to use content stream arrays for
> the /Page content stream.


Sure - that's why I said the content stream(s) had to be concatenated
before insertion, because the input might be an array of content streams.

I was thinking that to get reliable results when overlaying you'd have
to wrap the whole series of drawing operations from the input in state
saving/restoring operations, etc, thus having to concatenate the streams
before wrapping. In retrospect, that's not true; one can just as well
wrap each copied content stream in state save/restore and scale/position
operations.

It might even be possible to get away without a graphics state
save/restore, but I don't think so. IIRC multiple content streams are
treated by the reader as if they were one concatenated stream, so you
still have to save/restore gstate to ensure the inserted stream doesn't
mess up anything after it. I'll have to check this in the PDF ref, though.

> I'll leave exploring that to you, but
> basically it makes overlaying pages much much simpler. In related
> news, PDFBox does just that!! What we did (and it's super hack, but it
> worked) is if there we pages with both PDF-image content and FOP
> generated content, we'd get FOP to generate the content without the
> PDF-image and just overlay the pages. Best of both worlds!! (Though
> the purist in me is very much aggrieved)

Urk, that's horrible! Effective, though, I expect. Presumably you still
have to translate scale and rotate then clip the content stream you're
overlaying, though.

> [snip]
>
> The more you describe your problem, the more it sounds like you need
> to do exactly what we did, but just to be sure, I thought I'd explain
> how we got there. Assumptions are a dangerous thing and I've probably
> made some about your issue too.

Given what you've described I'm inclined to agree that the cause of the
issues is the same. I suspect we're facing the same problem or very
similar problems, in which case my RIP crash issues may not be font
related after all.

I still want to fix the font issues because, rip crash causing or not,
the font subset duplication produces massively bloated PDFs that are
totally unsuitable for online distribution. It's kind of disheartening
to learn that the RIP crash issues are probably something else entirely,
since I thought I at least had to solve only one problem.

As for doing exactly what you did: I'd certainly be very interested in
seeing your PDFBox code for loading the fop-generated PDF, finding the
placeholders, and overlaying the PDF graphics over them. In particular
I'd like to see how you handled scaling/translation/rotation/clipping
when drawing the copied streams, and how you handled state saving and
restoration.

I can see overlaying over placeholders in post-processing as a really
useful interim solution, though eventually I'd like to enhance
fop-pdf-image to do that overlaying directly.

The really frustrating thing is that sometimes using an XObject will be
exactly the right thing to do, because the PDF being embedded actually
appears multiple times in the document. The solution to this links
neatly into the font de-duplication issue: fop image plugins need a way
to store per-render-run information, in this case so they can determine
how often an image occurs in a document during the preload run and make
an appropriate decision about how to embed it. I'm not sure it's even
necessary to have an image plugin api change for this; plugins should be
able to store enough information in a WeakHashMap<FOUserAgent,...> to
figure it out, so I should be able to make fop-image-plugin use form
XObjects only for pdf images referenced multiple times.

--
Craig Ringer

Re: Fwd: fop-pdf-image and fonts; as requested

Reply via email to