Text extraction gibberish after ghostscript update
--------------------------------------------------

                 Key: PDFBOX-776
                 URL: https://issues.apache.org/jira/browse/PDFBOX-776
             Project: PDFBox
          Issue Type: Bug
          Components: FontBox, Text extraction
    Affects Versions: 1.2.1
            Reporter: Kevin Pearcey
         Attachments: test-870.pdf, test-871.pdf

I have a test pdf document that is generated using ps2pdf from ghostscript.

If I use ghostscript 8.70 then pdfbox will correctly extract the text
If I use ghostscript 8.71 then pdfbox will not correctly extract the text (same 
byte count but gibberish characters).

I will also note, that I had to update poppler to 0.14 to get it to correctly 
extract text from the test-871.pdf, the previous version would only extract the 
correct text from text-870.pdf.

Attached will be the pdf generated using the same original postscript file but 
run using ghostscript 8.70 vs 8.71.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to