On 03/04/12 01:16, Vincent Hennebert wrote:

>> From a quick look that sounds about right. Are you developing with a 1.7 JDK?
>> You would have to make your code 1.5-compatible. Also, it would be good if
>> you could back your optimizations with profiling data. If code
>> safety/readability have to be compromised there has to be a good reason
>> for it.

Yep, I'm on 1.7, but was building for a 1.5 target and source level.

The posted PDFName is really an example of what I'm thinking of; it's
not something that can be seriously merged without a *lot* of changes
across the rest of the tree to convert String name handling to PDFName,
to check for leading "/"s being added, to check for use of toString() in
name output to PDF data, etc. It's more a starting point for discussion
and to see what kind of interest there is in working on the pdf lib.

It looks like folks here are pretty receptive to significantly reworking
the PDF lib if there's someone willing to stump up the time. That might
well be me if it turns out porting the whole renderer to PDFBox instead
turns out to be lots harder than reworking the fop pdf lib. I suspect not.

>
> That said, please do feel free to give it a try if that route appeals to
> you. We would certainly consider a switch if it looks more promising in
> the long term.

I'm wondering how practical it'd be to progressively adopt PDFBox,
actually, rather than doing an abrupt and total switch.

The PDF primitives in PDFBox (COSName, COSNumber, COSDictionary, etc)
are modeled extremely similarlines as those in FOP's PDF library, and
while they won't be drop-in replacements they behave very similarly,
just with different method names and in some cases different datatype
assumptions (PDFName instead of String dictionary keys for example). The
only truly big difference looks to be in the handling of indirect
objects, where FOP uses one class that may be direct or indirect, and
PDFBox uses a dedicated `COSObject' class that wraps other objects for
indirect objects.

If it proves possible to do a largely 1:1 substitution of PDFBox
primitive PDF classes for FOP ones that'd *greatly* simplify the job of
moving to using the PDFBox `PD' model for document output, and would let
the job be done in a couple of distinct and separate chunks. PDFFont,
PDFXObject, etc would build on top of the COS types like COSDictionary,
COSBase and COSObject, rather than on top of PDFObject, PDFDictionary, etc.

It's not a perfect 1:1 mapping; notable imperfect matches where one or
more classes map to one or more different classes include:

  

  FOP                 PDFBOX
  ---------------------------------------------------
  java.lang.String,   COSName (for PDF names)
  FOP PDFText         java.lang.String (for Unicode text)
                      byte[] or a binbuf/binstr class (for binary data)
  ---------------------------------------------------
  PDFNumber           COSNumber
                      COSInteger
                      COSFloat
  ---------------------------------------------------
  ???
  (j.l.Boolean?)      COSBoolean
  ---------------------------------------------------
  PDFObject           COSBase  (direct)
                      COSObject  (wraps a COSBase, indirect)
  ----------------------------------------------------
  PDFDocument         COSDocument
                      [+a new class with fop-only functionality]
                      PDDocument
                      


Of those, the first already needs a cleanup as noted previously; fop
uses String for at least two distinct and incompatible things (Unicode
text and raw binary data) that must be separated anyway. The second
doesn't look to be a big deal as COSNumber has factory methods to take
care of it, it's pretty transparent. I'm not too worried about boolean
handling either. The bigger ones are the different model for indirect
objects, and the non-trivial work to move PDFDocument to extend/wrap
COSDocument. I haven't looked into this in enough depth to have a
meaningful assessment of how hard either of those would be yet.
 
OTOH, some of the important ones map pretty directly, allowing for the
differences in handling of indirect and direct objects higher up the
inheritance tree:

 FOP                 PDFBOX
 --------------------------------------------------
 PDFDictionary       COSDictionary
 PDFArray            COSArray
 PDFStream           COSStream
 PDFNull             COSNull


 If I get time (big if, at the moment) I'll see if I can have a play and
determine what kind of work is involved in doing this.

--
Craig Ringer

Reply via email to