[
https://issues.apache.org/jira/browse/TIKA-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489606#comment-16489606
]
Tim Allison commented on TIKA-2650:
-----------------------------------
Can you share with us exactly where the soft-hyphen isn't working? I see it
working sometimes. Note that there is often a difference between the text as
displayed and the text that is electronically stored (OCR'd?) within the PDF.
> Soft-hyphen is not extracted properly
> -------------------------------------
>
> Key: TIKA-2650
> URL: https://issues.apache.org/jira/browse/TIKA-2650
> Project: Tika
> Issue Type: Bug
> Components: app
> Affects Versions: 1.18
> Reporter: Saurabh Patil
> Priority: Blocker
> Attachments: Peter Rabbit.pdf
>
>
> We are tring to extract text from PDF. if PDF having any big word at the end
> of line then after half word there is soft hyphen and remaining word goes to
> next line. but which extracting these text TIKA automatically replace hyphen
> with space.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)