Hello Ben,

I've been using PDFBox within last year, but only version 0.6.3,
because of 2 reasons:

 1) I tried to migrate to never versions(o.6.4, 0.6.5, 0.6.6), but all the time I had
 problems with parsing the same pdf documents, which worked well for
 0.6.3. I mentioned my problems here:
  https://sourceforge.net/tracker/?func=detail&atid=552832&aid=1021691&group_id=78314

 2) When I were started with 0.6.3 I experienced perfomance problems
 too, especially with large pdf documents (I had several with more
 then 20MB size). I changed a bit source, wrapping the following line
 of BaseParser class:

            out = stream.createFilteredStream( streamLength );

            to
            
            out = new BufferedOutputStream(stream.createFilteredStream( streamLength 
));
            

 The performance increase, I've got, was huge:
 parsing 21MB pdf document to text before modifacatrion was taking 78
 seconds, after modification 12 seconds, so more the 6 times faster.

 I tried also to use buffered streams in some other places, but it was
 not that visible. I hope this change can also be incorporated into
 the current 0.6.6 release and then benchmarks may stay in PDFBox side
 :)


 Max


BL> On Wed, 8 Sep 2004, Chas Emerick wrote:
>> PDFTextStream: fast PDF text extraction for Java applications
>> http://snowtide.com/home/PDFTextStream/


BL> For those that have not seen, snowtide.com has done a performance
BL> comparison against several Java PDF->Text libraries, including Snowtide's
BL> PDFTextStream, PDFBox, Etymon PJ and JPedal.  It appears to be fairly well
BL> done.

BL> http://snowtide.com/home/PDFTextStream/Performance


BL> PDFBox: slow PDF text extraction for Java applications
BL> http://www.pdfbox.org

BL> :)

BL> Ben


BL> ---------------------------------------------------------------------
BL> To unsubscribe, e-mail: [EMAIL PROTECTED]
BL> For additional commands, e-mail: [EMAIL PROTECTED]




-- 
Best regards,
 Maxim                            mailto:[EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to