[ https://issues.apache.org/jira/browse/TIKA-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905461#comment-16905461 ]
Tim Allison commented on TIKA-2918: ----------------------------------- Sorry, if sortbyposition doesn't work for you, you _might_ need to use a custom subclass of PDFTextStripper. For example, see: https://apache.googlesource.com/pdfbox/+/trunk/examples/src/main/java/org/apache/pdfbox/examples/util/DrawPrintTextLocations.java You might also ask on us...@pdfbox.apache.org . They may well know of methods to get this right more generally. Sorry! > Extracted text in wrong order > ----------------------------- > > Key: TIKA-2918 > URL: https://issues.apache.org/jira/browse/TIKA-2918 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.20 > Reporter: jacob > Priority: Major > Attachments: extracted.txt, issue-screenshot.png, sample2.pdf, > sortByPosition.txt > > > When I extract the text from the attached pdf, the text is in the wrong order. -- This message was sent by Atlassian JIRA (v7.6.14#76016)