[
https://issues.apache.org/jira/browse/TIKA-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256941#comment-16256941
]
Tim Allison commented on TIKA-2505:
-----------------------------------
Adobe Reader's save as text yields garbled text for "original.pdf". Straight
PDFBox 2.0.8 app's "ExtractText" yields the same (at least very similar to my
quick eye) garbled text as AR. I don't think there's much we can do given, as
[~gagravarr] pointed out, the file is missing a Unicode mapping.
> Tika server output encoding problems
> ------------------------------------
>
> Key: TIKA-2505
> URL: https://issues.apache.org/jira/browse/TIKA-2505
> Project: Tika
> Issue Type: Bug
> Affects Versions: 1.16
> Reporter: Fanni Kovacs
> Attachments: original.pdf, response.txt, similar.pdf
>
>
> Hello,
> We noticed during a conversion of large amount of files, there are some
> issues when we get a non UTF-8 response from tika server 1.6.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)