rey bernal created PDFBOX-3079:
----------------------------------

             Summary: Extracting text between bookmarks not working
                 Key: PDFBOX-3079
                 URL: https://issues.apache.org/jira/browse/PDFBOX-3079
             Project: PDFBox
          Issue Type: Improvement
          Components: Text extraction
    Affects Versions: 2.0.0
         Environment: Windows
            Reporter: rey bernal
            Priority: Critical
             Fix For: 2.0.0


org.apache.pdfbox.text.PDFTextStripper does not really support extraction of 
content between bookmarks. from looking at the code in 
pdfbox-parent/pdfbox/src/main/java/org/apache/pdfbox/text/PDFTextStripper.java 
it is clear that is using the bookmarks that the user provided to determine the 
pages to extract content from.

There is a business need to extract the text that lies strictly between 
bookmarks. Refer to the attached example program and sample file.
The extraction to the sections in the first page all return the entire first 
page instead of the content inside each bookmark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to