On 06/03/2012 11:08, mehdi houshmand wrote:

Hi Mehdi,

Font de-duping is intrinsically a post-process action, you need the
full document, with all fonts, before you can do any font de-duping.
PostScript does this very thing (to a much lesser extent) with the
<optimize-resources>  tag, as a post-process action.
At least that is transparent to the user, but re-parsing the input is a sub-optimal solution as it incurs a performance penalty so we should investigate if there are alternatives first. I can't recall why the Postscript Paintewr/Renderer was architected in that way but thats a separate topic.

Also, the requirements aren't clear here, what is it we want here? Let
me validate that, this shouldn't change the (I guess we can call it)
"canonical" PDF document. By that I mean if you rasterized a PDF
before and after this change they should be identical,
pixel-for-pixel. When Acrobat does the font de-duping (I don't
remember how much control it gives you, but if there are levels of
de-duping I would have chosen the most aggressive), the documents
aren't identical. There are aberrations caused by slight kerning
differences between various verisons of Arial. This may seem trivial
when compared to bloated PDFs, but it looks tacky and lowers the high
standard of documents. You could argue this could be configurable...
But then I'd re-iterate my first argument, this is a post-process
action, not the concern of FOP or the pdf-image-plugin.
The requirements are perfectly clear: Given a set of input PDFs, XSL-FO, create a single merged PDF with a consistent and unduplicated set of fonts. Why would there be slight kerning differences if the assumption that the font name is unique holds true. If that assumption is wrong then I agree with what you say. Ultimately that should be down to the user though, they know their fonts, so they can decide whether to merge them or not via a setting in the fop.xconf. Your argument is not sufficient to say this approach should never be used. It brings a lot of benefit to users who know their font names are unique.

The other issue is you have subset fonts created by FOP as well as
those imported by the pdf-image-plugin. You'd have to create some
bridge between the image loading framework and the font loading system
*cough* HACK *cough*. Alternatively, just thinking aloud here, if this
was done as a post-process *wink* *wink* *wry smile*...
Jeremias and Craig have already sent e-mails on this topic. It is perfectly valid for any image loaded via the image loading framework to pass around contextual information. If the changes are done properly then it is not a hack. Sure there are some easy ways to do it that classify a hack, but I prefer to follow the approach outlines by Jeremias in one of his off list e-mails about storing contextual information for images loaded via the image loading framework.

Apologies if I may seem to be argumentative here, it's not my
intention, but I feel this is would be serious scope creep. I see the
pdf-image-plugin as a plugin that treats PDFs as images, nothing more.
If you want to stitch together PDFs, PDFBox is designed just for that.
It's true that this work touches more than FOP, but I don't see that as a good argument against using this as a GSoC project. All the code that this touches is open source, with the exception of the image loader plug-in and that is something the PMC is discussing with Jeremias.




On 6 March 2012 10:36, Chris Bowditch<bowditch_ch...@hotmail.com>  wrote:
On 06/03/2012 10:12, mehdi houshmand wrote:
I fat-fingered the reply button instead of reply-to-all... *face-palm*

Mehdi, Craig,

- Anything in the proposed XSL-FO 2.0 feature list (though most of it
be realistic for GSoC projects);

- Merge fop-pdf-image and implement smart merging of font, profile, and
image resources. I'm working on this one at the moment, but slowly and
amid other projects.
I really don't think that's a suitable project, I responded to your
post so maybe we could take this conversation else where, but this
really isn't FOPs responsibilty, or for that matter the
pdf-image-plugin. If anything, I'd argue that's a PDFBox project,
Adobe Acrobat Pro does this kind of thing (badly may I add) as a
post-process action and I think that's the correct way to do it. The
other thing to say is that a new comer may not appreciate the
importance of fidelity when fonts are concerned. Basically it's too
difficult for a student given a few months and no previous experience.
Sorry Mehdi I don't agree. I think this would be a great project. Craig
already outlined what needs to be done and theres a lot of stuff in XGC and
FOP as well as the plug-in. I'm not sure anything is needed in PDF-Box, but
even if it then is an Apache project too and the student can submit patches
there. Adobe Acrobat may make some assumptions that don't always hold true,
but our customers are crying out for FOP to create smaller PDF files when
importing multiple PDF images with embedded fonts. This also feels
reasonable well defined thanks to Craig's list of TODOs and feels like it
can be done in 3 months. It gets a +1 from me.



Craig Ringer

Reply via email to