[
https://issues.apache.org/jira/browse/PDFBOX-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr reassigned PDFBOX-2792:
---------------------------------------
Assignee: Tilman Hausherr
> Text extraction ignores bookmarks
> ---------------------------------
>
> Key: PDFBOX-2792
> URL: https://issues.apache.org/jira/browse/PDFBOX-2792
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.8.9, 2.0.0
> Reporter: Tilman Hausherr
> Assignee: Tilman Hausherr
>
> As reported by Noam S. on the user mailing list:
> {quote}
> My problem is that when trying to getText(doc) form a certain section of the
> pdf using setStartBookmark(item) and setEndBookmark(item) I get all the text
> rather than just the text from the specified section.
> WhiIe trying to resolve this I realized that the writeText(doc, outputStream)
> method always calls resetEngine() method. That will reset all the parameters
> and delete the bookmarks I set.
> {quote}
> The two lines that reset the bookmarks were added to resetEngine in
> PDFBOX-1808 in [ https://svn.apache.org/r1553175 ] in an attempt to save some
> memory.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]