[
https://issues.apache.org/jira/browse/PDFBOX-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vitalie Bureanu updated PDFBOX-1553:
------------------------------------
Attachment: Selection in Adobe Reader.png
Extracted coordinates of rects.jpg
Parser.java
EnSt11_offset.pdf
EnSt10_offset.pdf
> Offset of extracted coordinates
> -------------------------------
>
> Key: PDFBOX-1553
> URL: https://issues.apache.org/jira/browse/PDFBOX-1553
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.8.0
> Environment: Linux Ubuntu 64 bit, Java
> Reporter: Vitalie Bureanu
> Labels: offset
> Attachments: EnSt10_offset.pdf, EnSt11_offset.pdf, Extracted
> coordinates of rects.jpg, Parser.java, Selection in Adobe Reader.png
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> Hello,
> Preamble: We are glad to use PDFBox and I personally grateful to all
> developers who sustain this project. It is good work, guys!
> We have one problem. For our application purposes we extract from pdf "char
> by char" with rispective coordinates for each char. (see attached Parser)
> After this we group chars into the words. We noticed that for some pdf
> documents we have a strange "offset" for extracted coordinates. (see screens)
> The offset is incremental - at left top corner of document is near to real
> coordinates of charcater, but at right bottom corner is near to 0.5 cm..
> If I make selection in Adobe Reader - it seems all ok.
> I attached two pdf files with offset to this post.
> If you want to see the offset "in action" you can use our service to do it at
> http://pdf2data.cloudforpeople.com/ (Please do not consider it as advertising)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira