[ https://issues.apache.org/jira/browse/PDFBOX-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vitalie Bureanu updated PDFBOX-1553: ------------------------------------ Attachment: Selection in Adobe Reader.png Extracted coordinates of rects.jpg Parser.java EnSt11_offset.pdf EnSt10_offset.pdf > Offset of extracted coordinates > ------------------------------- > > Key: PDFBOX-1553 > URL: https://issues.apache.org/jira/browse/PDFBOX-1553 > Project: PDFBox > Issue Type: Bug > Affects Versions: 1.8.0 > Environment: Linux Ubuntu 64 bit, Java > Reporter: Vitalie Bureanu > Labels: offset > Attachments: EnSt10_offset.pdf, EnSt11_offset.pdf, Extracted > coordinates of rects.jpg, Parser.java, Selection in Adobe Reader.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Hello, > Preamble: We are glad to use PDFBox and I personally grateful to all > developers who sustain this project. It is good work, guys! > We have one problem. For our application purposes we extract from pdf "char > by char" with rispective coordinates for each char. (see attached Parser) > After this we group chars into the words. We noticed that for some pdf > documents we have a strange "offset" for extracted coordinates. (see screens) > The offset is incremental - at left top corner of document is near to real > coordinates of charcater, but at right bottom corner is near to 0.5 cm.. > If I make selection in Adobe Reader - it seems all ok. > I attached two pdf files with offset to this post. > If you want to see the offset "in action" you can use our service to do it at > http://pdf2data.cloudforpeople.com/ (Please do not consider it as advertising) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira