[jira] [Closed] (PDFBOX-1783) PdfBox extracts werid signs instead of text

Tilman Hausherr (JIRA) Thu, 12 Jun 2014 07:09:25 -0700

     [ 
https://issues.apache.org/jira/browse/PDFBOX-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tilman Hausherr closed PDFBOX-1783.
-----------------------------------

    Resolution: Not a Problem

There is no text when trying with acrobat reader, see also here:
https://pdfbox.apache.org/userguide/faq.html#notext


> PdfBox extracts werid signs instead of text
> -------------------------------------------
>
>                 Key: PDFBOX-1783
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1783
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.8.2
>         Environment: Linux, MacOSX
>            Reporter: Marc Teutelink
>              Labels: patch
>         Attachments: gaatfout.pdf, 
> plain_text_tika_output_from_gaat_fout_pdf.txt, 
> structured_text_tika_output_from_gaat_fout_pdf.xml
>
>
> PDFBox extracts complete bogus text from the attached document. I have 
> attached the .PDF in question. I discovered this when using Tika, so I have 
> linked the corresponding TIKA Jira issue to this issue as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Closed] (PDFBOX-1783) PdfBox extracts werid signs instead of text

Reply via email to