[
https://issues.apache.org/jira/browse/TIKA-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15986442#comment-15986442
]
Tim Allison commented on TIKA-2342:
-----------------------------------
Welcome to PDFs! This _may_ be fixable at the PDFBox level. See:
https://wiki.apache.org/tika/Troubleshooting%20Tika#PDF_Text_Problems
If you can reproduce this with pure PDFBox, please open an issue on their JIRA.
and more generally:
https://wiki.apache.org/tika/PDFParser%20(Apache%20PDFBox)
> Broken words
> ------------
>
> Key: TIKA-2342
> URL: https://issues.apache.org/jira/browse/TIKA-2342
> Project: Tika
> Issue Type: Bug
> Affects Versions: 1.14
> Environment: Tika app and Tika server
> Reporter: Nino Skopac
>
> Original PDF text: "Each certified or noncertified member"
> Tika extracted text: "Each certifi ed or noncertifi ed member"
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)