Strange behavior in TextPositionComparator ------------------------------------------
Key: PDFBOX-1170 URL: https://issues.apache.org/jira/browse/PDFBOX-1170 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 1.6.0, 1.7.0 Environment: Windows Reporter: Sébastien Dailly Priority: Minor When extracting text for the pdf (see attachement) with setSortByPosition(true), the output does not follow nor the visual position of the elements, nor the document structure. Here is the output of PDfTextStripper : 11111 333333333333333 : 222222222 The expected output would be : 11111 : 222222222 333333333333333 The string « 11111 : » is defined in only one instruction : [(1) -9.555729866 (1) 17.5939998627 (1) 3.5597500801 (1) 1.9403500557 (1) 4.1794600487 ( ) -0.1493600011 (:) -4.7775301933 ( ) 250 ] TJ How explain that the 3... is inserted inside ? (Note : the pdf has been deflated and edited for « anonymising » the text. I also removed a picture, wich explain the XRef error ) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira