[
https://issues.apache.org/jira/browse/PDFBOX-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986801#comment-14986801
]
Tilman Hausherr commented on PDFBOX-3079:
-----------------------------------------
Did you look at PDFDebugger? You can use the destination parameters to create
area bounds. For the destination I mentioned, the contents are Page 2, XYZ, 69,
701, 0. So the coordinates are x=69 and y=701.
> Extracting text between bookmarks not working
> ---------------------------------------------
>
> Key: PDFBOX-3079
> URL: https://issues.apache.org/jira/browse/PDFBOX-3079
> Project: PDFBox
> Issue Type: Improvement
> Components: Text extraction
> Affects Versions: 2.0.0
> Environment: Windows
> Reporter: rey bernal
> Labels: textextraction
> Attachments: Test.java, test.pdf
>
>
> org.apache.pdfbox.text.PDFTextStripper does not really support extraction of
> content between bookmarks. from looking at the code in
> pdfbox-parent/pdfbox/src/main/java/org/apache/pdfbox/text/PDFTextStripper.java
> it is clear that is using the bookmarks that the user provided to determine
> the pages to extract content from.
> There is a business need to extract the text that lies strictly between
> bookmarks. Refer to the attached example program and sample file.
> The extraction to the sections in the first page all return the entire first
> page instead of the content inside each bookmark.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]