Ali Majdzadeh Kohbanani created PDFBOX-1424:
-----------------------------------------------
Summary: Wrong glyph (Persian) is used in extacted text instead
of the original glyph (Persian) in PDF file
Key: PDFBOX-1424
URL: https://issues.apache.org/jira/browse/PDFBOX-1424
Project: PDFBox
Issue Type: Bug
Components: Text extraction
Affects Versions: 1.7.1
Environment: Windows XP, Java 1.6.0
Reporter: Ali Majdzadeh Kohbanani
Hi
I am very new to PDFBox and I am dealing with Persian PDF files. When I convert
Persian PDF files using PDFBox-app, some Persian glyphs like م are displayed
wrongly in the extracted text. For example, the following "هستم" in Persian is
extracted as "هستن" and "من" in Persian is extracted as "هن". Also, the work
"سلام" is extracted as "سالم".
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira