Hi All, I am a committer for another Apache project (cTAKES) and have been 
using PDFBox in my own application for a while now by extending PDFTextStripper 
and overriding processTextPosition.
I was in the process of updating to 2.0-RC2 (from 1.8) and came across a few 
items that seem like they may be issues.
My apologies if this has already been discussed. I did a quick search through 
JIRA and nothing was obvious.

1.
PDFTextStripper.processPages(...)
This accepts a PDPageTree as the parameter but the first line of the method is 
to instantiate a new PDPageTree by calling document.getPages().
Should this just use the passed in pages parameter instead of using 2 instances 
of PDPageTree?

2.
The first line in processPages has a document object that is null unless you 
call getText() first.
Is the correct behavior to call getText before being able to call processPages?

3.
processPage(…) doesn’t appear to do anything unless its called from 
processPages(…) because currentPageNo is not set if you just call 
processPage(…) directly.
This method probably can’t be made private because its an override but should 
it either remove the check for currentPageNo or otherwise throw an exception / 
log a warning?

Cheers,

Britt


Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110
http://wiredinformatics.com
britt.fi...@wiredinformatics.com

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to