[
https://issues.apache.org/jira/browse/PDFBOX-726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Lehmkühler resolved PDFBOX-726.
---------------------------------------
Resolution: Fixed
IMHO manipulating the current page number directly seems to be a little bit to
risky, as it is possible to break the whole extraction process doing so.
But I think I found another working solution: PDFTextStripper now overrides the
resetEngine method and resets the current page number every time when that
method is called.
I've added the changes with version 956354. Thanks to Ryan for the hint.
> PDFTextStripper: allow access to currentPageNo variable
> -------------------------------------------------------
>
> Key: PDFBOX-726
> URL: https://issues.apache.org/jira/browse/PDFBOX-726
> Project: PDFBox
> Issue Type: Improvement
> Components: Text extraction
> Affects Versions: 1.1.0
> Reporter: Ryan Nideffer
> Fix For: 1.2.0
>
>
> I've extended org.apache.pdfbox.util.PDFTextStripper and I'm using it to
> perform a 2-pass extraction over a document. However, the second pass doesnt
> happen because I am unable to alter the variable currentPageNo, which
> maintains the current page number in the pdf document. It is a variable with
> access modifier of private, and only a get method is provided.
> The only time currentPageNo is set to 0 is via 'writePage(PDDocument,
> OutputStream)' which I am overriding/not calling.
> 2 possible resolutions:
> - make currentPageNo protected instead of private (preferred)
> - add setCurrentPageNo method
> Thank you,
> Ryan
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.