[jira] [Created] (PDFBOX-2451) Only gibberish extracted from certain PDF files

Stefan Postema (JIRA) Fri, 24 Oct 2014 01:01:07 -0700

Stefan Postema created PDFBOX-2451:
--------------------------------------

             Summary: Only gibberish extracted from certain PDF files
                 Key: PDFBOX-2451
                 URL: https://issues.apache.org/jira/browse/PDFBOX-2451
             Project: PDFBox
          Issue Type: Bug
            Reporter: Stefan Postema



I was told to report a bug here. There are problems with extracting text from 
PDF files in Dutch. The bug was reported in issue TIKA-1095 
(https://issues.apache.org/jira/browse/TIKA-1095). The problem can be 
reproduced with the latest Tika version.

The extracted Text only shows gibberish (or in other cases question marks and 
incorrect characters).

It was suggested it could be a font problem. Could this be looked into?





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (PDFBOX-2451) Only gibberish extracted from certain PDF files

Reply via email to