Hi All, I am a committer for another Apache project (cTAKES) and have been using PDFBox in my own application for a while now by extending PDFTextStripper and overriding processTextPosition. I was in the process of updating to 2.0-RC2 (from 1.8) and came across a few items that seem like they may be issues. My apologies if this has already been discussed. I did a quick search through JIRA and nothing was obvious.
1. PDFTextStripper.processPages(...) This accepts a PDPageTree as the parameter but the first line of the method is to instantiate a new PDPageTree by calling document.getPages(). Should this just use the passed in pages parameter instead of using 2 instances of PDPageTree? 2. The first line in processPages has a document object that is null unless you call getText() first. Is the correct behavior to call getText before being able to call processPages? 3. processPage(…) doesn’t appear to do anything unless its called from processPages(…) because currentPageNo is not set if you just call processPage(…) directly. This method probably can’t be made private because its an override but should it either remove the check for currentPageNo or otherwise throw an exception / log a warning? Cheers, Britt Britt Fitch Wired Informatics 265 Franklin St Ste 1702 Boston, MA 02110 http://wiredinformatics.com britt.fi...@wiredinformatics.com
signature.asc
Description: Message signed with OpenPGP using GPGMail