Stefan Postema created PDFBOX-2451:
--------------------------------------
Summary: Only gibberish extracted from certain PDF files
Key: PDFBOX-2451
URL: https://issues.apache.org/jira/browse/PDFBOX-2451
Project: PDFBox
Issue Type: Bug
Reporter: Stefan Postema
I was told to report a bug here. There are problems with extracting text from
PDF files in Dutch. The bug was reported in issue TIKA-1095
(https://issues.apache.org/jira/browse/TIKA-1095). The problem can be
reproduced with the latest Tika version.
The extracted Text only shows gibberish (or in other cases question marks and
incorrect characters).
It was suggested it could be a font problem. Could this be looked into?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)