Text extraction gibberish after ghostscript update
--------------------------------------------------
Key: PDFBOX-776
URL: https://issues.apache.org/jira/browse/PDFBOX-776
Project: PDFBox
Issue Type: Bug
Components: FontBox, Text extraction
Affects Versions: 1.2.1
Reporter: Kevin Pearcey
Attachments: test-870.pdf, test-871.pdf
I have a test pdf document that is generated using ps2pdf from ghostscript.
If I use ghostscript 8.70 then pdfbox will correctly extract the text
If I use ghostscript 8.71 then pdfbox will not correctly extract the text (same
byte count but gibberish characters).
I will also note, that I had to update poppler to 0.14 to get it to correctly
extract text from the test-871.pdf, the previous version would only extract the
correct text from text-870.pdf.
Attached will be the pdf generated using the same original postscript file but
run using ghostscript 8.70 vs 8.71.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.