Joel Hirsh created PDFBOX-2463:
----------------------------------

             Summary: ExtractTextByArea mangling second half of this string - 
transposed, skipped, etc
                 Key: PDFBOX-2463
                 URL: https://issues.apache.org/jira/browse/PDFBOX-2463
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 1.8.7
            Reporter: Joel Hirsh


PDF snippet is being completely mangled by ExtractTextByArea.  Have a large PDF 
file where this is happening on every line.  

Visually (and Acrobat) show the text:
12 Jun EP COPY WORKS LIMITED 503646200256 5637 3.70 11,252.49 OD

However ExtractTextByArea comes up with:
12 Jun EP COPY WORKS LIMITED 503646200256 35 .6 70
11,
3 257 2.49
OD

So the first half of the string is ok, but starting at '5637' characters are 
skipped, other characters are inserted, completely mangled.

FWIW I did dump the COSString's in PDFStreamEngine and the strings all show 
correctly, nothing unusual.

Test file to be attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to