[
https://issues.apache.org/jira/browse/PDFBOX-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
John Hewson updated PDFBOX-1783:
--------------------------------
Component/s: (was: Swing GUI)
Text extraction
> PdfBox extracts werid signs instead of text
> -------------------------------------------
>
> Key: PDFBOX-1783
> URL: https://issues.apache.org/jira/browse/PDFBOX-1783
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.8.2
> Environment: Linux, MacOSX
> Reporter: Marc Teutelink
> Labels: patch
> Attachments: gaatfout.pdf,
> plain_text_tika_output_from_gaat_fout_pdf.txt,
> structured_text_tika_output_from_gaat_fout_pdf.xml
>
>
> PDFBox extracts complete bogus text from the attached document. I have
> attached the .PDF in question. I discovered this when using Tika, so I have
> linked the corresponding TIKA Jira issue to this issue as well.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)