[
https://issues.apache.org/jira/browse/PDFBOX-4612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894409#comment-16894409
]
Tilman Hausherr commented on PDFBOX-4612:
-----------------------------------------
The file has another problem - "develo ent". The cause for this is a single
article bead that causes chaos. Set "-ignoreBeads" in the CLA, and
"setShouldSeparateByBeads(false)" in the stripper.
> The ExtractText command extracts wrong text
> -------------------------------------------
>
> Key: PDFBOX-4612
> URL: https://issues.apache.org/jira/browse/PDFBOX-4612
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.16
> Reporter: Yuri
> Priority: Major
> Attachments: bartel2018-p7.txt
>
>
> In this pdf [http://sci-hub.tw/10.1016/j.cell.2018.03.006] it extracts the
> text "ataxia, and death by ~4 months" as "ataxia, and death by ^A4 months".
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]