Vitalie Bureanu created PDFBOX-1553:
---------------------------------------

             Summary: Offset of extracted coordinates
                 Key: PDFBOX-1553
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1553
             Project: PDFBox
          Issue Type: Bug
    Affects Versions: 1.8.0
         Environment: Linux Ubuntu 64 bit, Java
            Reporter: Vitalie Bureanu


Hello,

Preamble: We are glad to use PDFBox and I personally grateful to all developers 
who sustain this project. It is good work, guys!

We have one problem. For our application purposes we extract from pdf "char by 
char" with rispective coordinates for each char. (see attached Parser)
After this we group chars into the words. We noticed that for some pdf 
documents we have a strange "offset" for extracted coordinates. (see screens)

The offset is incremental - at left top corner of document is near to real 
coordinates of charcater, but at right bottom corner is near to 0.5 cm..
If I make selection in Adobe Reader - it seems all ok.

I attached two pdf files with offset to this post.
If you want to see the offset "in action" you can use our service to do it at 
http://pdf2data.cloudforpeople.com/ (Please do not consider it as advertising)




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to