[ 
https://issues.apache.org/jira/browse/TIKA-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15986442#comment-15986442
 ] 

Tim Allison commented on TIKA-2342:
-----------------------------------

Welcome to PDFs!  This _may_ be fixable at the PDFBox level.  See: 
https://wiki.apache.org/tika/Troubleshooting%20Tika#PDF_Text_Problems 

If you can reproduce this with pure PDFBox, please open an issue on their JIRA.

and more generally: 
https://wiki.apache.org/tika/PDFParser%20(Apache%20PDFBox)



> Broken words
> ------------
>
>                 Key: TIKA-2342
>                 URL: https://issues.apache.org/jira/browse/TIKA-2342
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.14
>         Environment: Tika app and Tika server
>            Reporter: Nino Skopac
>
> Original PDF text: "Each certified or noncertified member"
> Tika extracted text: "Each certifi ed or noncertifi ed member"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to