[
https://issues.apache.org/jira/browse/PDFBOX-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829786#comment-13829786
]
Marc Teutelink commented on PDFBOX-1783:
----------------------------------------
TIKA-1199 is the corresponding Apache Tika issue
> PdfBox extracts werid signs instead of text
> -------------------------------------------
>
> Key: PDFBOX-1783
> URL: https://issues.apache.org/jira/browse/PDFBOX-1783
> Project: PDFBox
> Issue Type: Bug
> Components: PDFReader
> Affects Versions: 1.8.2
> Environment: Linux, MacOSX
> Reporter: Marc Teutelink
> Labels: patch
> Attachments: gaatfout.pdf,
> plain_text_tika_output_from_gaat_fout_pdf.txt,
> structured_text_tika_output_from_gaat_fout_pdf.xml
>
>
> PDFBox extracts complete bogus text from the attached document. I have
> attached the .PDF in question. I discovered this when using Tika, so I have
> linked the corresponding TIKA Jira issue to this issue as well.
--
This message was sent by Atlassian JIRA
(v6.1#6144)