Vitalie Bureanu created PDFBOX-1553: ---------------------------------------
Summary: Offset of extracted coordinates Key: PDFBOX-1553 URL: https://issues.apache.org/jira/browse/PDFBOX-1553 Project: PDFBox Issue Type: Bug Affects Versions: 1.8.0 Environment: Linux Ubuntu 64 bit, Java Reporter: Vitalie Bureanu Hello, Preamble: We are glad to use PDFBox and I personally grateful to all developers who sustain this project. It is good work, guys! We have one problem. For our application purposes we extract from pdf "char by char" with rispective coordinates for each char. (see attached Parser) After this we group chars into the words. We noticed that for some pdf documents we have a strange "offset" for extracted coordinates. (see screens) The offset is incremental - at left top corner of document is near to real coordinates of charcater, but at right bottom corner is near to 0.5 cm.. If I make selection in Adobe Reader - it seems all ok. I attached two pdf files with offset to this post. If you want to see the offset "in action" you can use our service to do it at http://pdf2data.cloudforpeople.com/ (Please do not consider it as advertising) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira