arjunce created PDFBOX-5857:
-------------------------------

             Summary: PDFTextStripper returns messed up data 
                 Key: PDFBOX-5857
                 URL: https://issues.apache.org/jira/browse/PDFBOX-5857
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 3.0.2 PDFBox
            Reporter: arjunce
         Attachments: extractedText.txt, jumbledtext.pdf

I have attached below the input pdf and its text output for you to take a look 
at. I am using PDFTextStripper along with these:
{code:java}
super();
this.setSortByPosition(true);
this.setWordSeparator("_word_"); {code}
Since I am using sort by position the text is jumbled. Is there a way for me to 
detect this instead of outputting the jumbled text? Any help is appreciated, 
Thanks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to