[
https://issues.apache.org/jira/browse/PDFBOX-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr closed PDFBOX-1783.
-----------------------------------
Resolution: Not a Problem
There is no text when trying with acrobat reader, see also here:
https://pdfbox.apache.org/userguide/faq.html#notext
> PdfBox extracts werid signs instead of text
> -------------------------------------------
>
> Key: PDFBOX-1783
> URL: https://issues.apache.org/jira/browse/PDFBOX-1783
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.8.2
> Environment: Linux, MacOSX
> Reporter: Marc Teutelink
> Labels: patch
> Attachments: gaatfout.pdf,
> plain_text_tika_output_from_gaat_fout_pdf.txt,
> structured_text_tika_output_from_gaat_fout_pdf.xml
>
>
> PDFBox extracts complete bogus text from the attached document. I have
> attached the .PDF in question. I discovered this when using Tika, so I have
> linked the corresponding TIKA Jira issue to this issue as well.
--
This message was sent by Atlassian JIRA
(v6.2#6252)