Hello Ben, I've been using PDFBox within last year, but only version 0.6.3, because of 2 reasons:
1) I tried to migrate to never versions(o.6.4, 0.6.5, 0.6.6), but all the time I had problems with parsing the same pdf documents, which worked well for 0.6.3. I mentioned my problems here: https://sourceforge.net/tracker/?func=detail&atid=552832&aid=1021691&group_id=78314 2) When I were started with 0.6.3 I experienced perfomance problems too, especially with large pdf documents (I had several with more then 20MB size). I changed a bit source, wrapping the following line of BaseParser class: out = stream.createFilteredStream( streamLength ); to out = new BufferedOutputStream(stream.createFilteredStream( streamLength )); The performance increase, I've got, was huge: parsing 21MB pdf document to text before modifacatrion was taking 78 seconds, after modification 12 seconds, so more the 6 times faster. I tried also to use buffered streams in some other places, but it was not that visible. I hope this change can also be incorporated into the current 0.6.6 release and then benchmarks may stay in PDFBox side :) Max BL> On Wed, 8 Sep 2004, Chas Emerick wrote: >> PDFTextStream: fast PDF text extraction for Java applications >> http://snowtide.com/home/PDFTextStream/ BL> For those that have not seen, snowtide.com has done a performance BL> comparison against several Java PDF->Text libraries, including Snowtide's BL> PDFTextStream, PDFBox, Etymon PJ and JPedal. It appears to be fairly well BL> done. BL> http://snowtide.com/home/PDFTextStream/Performance BL> PDFBox: slow PDF text extraction for Java applications BL> http://www.pdfbox.org BL> :) BL> Ben BL> --------------------------------------------------------------------- BL> To unsubscribe, e-mail: [EMAIL PROTECTED] BL> For additional commands, e-mail: [EMAIL PROTECTED] -- Best regards, Maxim mailto:[EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
