Re: Fwd: Google Summer of Code

Chris Bowditch Tue, 06 Mar 2012 03:30:24 -0800

On 06/03/2012 11:08, mehdi houshmand wrote:

Hi Mehdi,

Font de-duping is intrinsically a post-process action, you need the
full document, with all fonts, before you can do any font de-duping.
PostScript does this very thing (to a much lesser extent) with the
<optimize-resources>  tag, as a post-process action.

At least that is transparent to the user, but re-parsing the input is asub-optimal solution as it incurs a performance penalty so we shouldinvestigate if there are alternatives first. I can't recall why thePostscript Paintewr/Renderer was architected in that way but thats aseparate topic.


Also, the requirements aren't clear here, what is it we want here? Let
me validate that, this shouldn't change the (I guess we can call it)
"canonical" PDF document. By that I mean if you rasterized a PDF
before and after this change they should be identical,
pixel-for-pixel. When Acrobat does the font de-duping (I don't
remember how much control it gives you, but if there are levels of
de-duping I would have chosen the most aggressive), the documents
aren't identical. There are aberrations caused by slight kerning
differences between various verisons of Arial. This may seem trivial
when compared to bloated PDFs, but it looks tacky and lowers the high
standard of documents. You could argue this could be configurable...
But then I'd re-iterate my first argument, this is a post-process
action, not the concern of FOP or the pdf-image-plugin.

The requirements are perfectly clear: Given a set of input PDFs, XSL-FO,create a single merged PDF with a consistent and unduplicated set offonts. Why would there be slight kerning differences if the assumptionthat the font name is unique holds true. If that assumption is wrongthen I agree with what you say. Ultimately that should be down to theuser though, they know their fonts, so they can decide whether to mergethem or not via a setting in the fop.xconf. Your argument is notsufficient to say this approach should never be used. It brings a lot ofbenefit to users who know their font names are unique.


The other issue is you have subset fonts created by FOP as well as
those imported by the pdf-image-plugin. You'd have to create some
bridge between the image loading framework and the font loading system
*cough* HACK *cough*. Alternatively, just thinking aloud here, if this
was done as a post-process *wink* *wink* *wry smile*...

Jeremias and Craig have already sent e-mails on this topic. It isperfectly valid for any image loaded via the image loading framework topass around contextual information. If the changes are done properlythen it is not a hack. Sure there are some easy ways to do it thatclassify a hack, but I prefer to follow the approach outlines byJeremias in one of his off list e-mails about storing contextualinformation for images loaded via the image loading framework.


Apologies if I may seem to be argumentative here, it's not my
intention, but I feel this is would be serious scope creep. I see the
pdf-image-plugin as a plugin that treats PDFs as images, nothing more.
If you want to stitch together PDFs, PDFBox is designed just for that.

It's true that this work touches more than FOP, but I don't see that asa good argument against using this as a GSoC project. All the code thatthis touches is open source, with the exception of the image loaderplug-in and that is something the PMC is discussing with Jeremias.


Thanks,

Chris


Mehdi

On 6 March 2012 10:36, Chris Bowditch<[email protected]>  wrote:

On 06/03/2012 10:12, mehdi houshmand wrote:

I fat-fingered the reply button instead of reply-to-all... *face-palm*


Mehdi, Craig,
<snip/>

- Anything in the proposed XSL-FO 2.0 feature list (though most of it
won't
be realistic for GSoC projects);

- Merge fop-pdf-image and implement smart merging of font, profile, and
image resources. I'm working on this one at the moment, but slowly and
only
amid other projects.

I really don't think that's a suitable project, I responded to your
post so maybe we could take this conversation else where, but this
really isn't FOPs responsibilty, or for that matter the
pdf-image-plugin. If anything, I'd argue that's a PDFBox project,
Adobe Acrobat Pro does this kind of thing (badly may I add) as a
post-process action and I think that's the correct way to do it. The
other thing to say is that a new comer may not appreciate the
importance of fidelity when fonts are concerned. Basically it's too
difficult for a student given a few months and no previous experience.

Sorry Mehdi I don't agree. I think this would be a great project. Craig
already outlined what needs to be done and theres a lot of stuff in XGC and
FOP as well as the plug-in. I'm not sure anything is needed in PDF-Box, but
even if it then is an Apache project too and the student can submit patches
there. Adobe Acrobat may make some assumptions that don't always hold true,
but our customers are crying out for FOP to create smaller PDF files when
importing multiple PDF images with embedded fonts. This also feels
reasonable well defined thanks to Craig's list of TODOs and feels like it
can be done in 3 months. It gets a +1 from me.

Thanks,

Chris

--
Craig Ringer

Re: Fwd: Google Summer of Code

Reply via email to