[ https://issues.apache.org/jira/browse/PDFBOX-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr updated PDFBOX-4834: ------------------------------------ Component/s: (was: PDModel) (was: Parsing) Text extraction > Wrong read characters for Hindi conjuncts > ----------------------------------------- > > Key: PDFBOX-4834 > URL: https://issues.apache.org/jira/browse/PDFBOX-4834 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 2.0.19 > Environment: Windows 10, Java 9. > Reporter: Hesham > Priority: Minor > Attachments: PDFBOX-4834-Hindi.pdf > > > When reading this Hindi PDF book using PDFBox 2.0.19: > [https://dl.dropboxusercontent.com/s/laixlb5omvjqr7y/Hindi%20Book.pdf?dl=0] > > It reads it with some wrong characters for conjuncts as it appears in this > file: > [https://dl.dropboxusercontent.com/s/efyxz2eg37gvn4c/Text%20read%20by%20PDFBox%202.0.19.txt?dl=0] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org