[ 
https://issues.apache.org/jira/browse/TIKA-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256941#comment-16256941
 ] 

Tim Allison commented on TIKA-2505:
-----------------------------------

Adobe Reader's save as text yields garbled text for "original.pdf".  Straight 
PDFBox 2.0.8 app's "ExtractText" yields the same (at least very similar to my 
quick eye) garbled text as AR.  I don't think there's much we can do given, as 
[~gagravarr] pointed out, the file is missing a Unicode mapping.

> Tika server output encoding problems
> ------------------------------------
>
>                 Key: TIKA-2505
>                 URL: https://issues.apache.org/jira/browse/TIKA-2505
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.16
>            Reporter: Fanni Kovacs
>         Attachments: original.pdf, response.txt, similar.pdf
>
>
> Hello,
> We noticed during a conversion of large amount of files, there are some 
> issues when we get a non UTF-8 response from tika server 1.6.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to