[
https://issues.apache.org/jira/browse/PDFBOX-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475831#comment-13475831
]
Andreas Lehmkühler commented on PDFBOX-1424:
--------------------------------------------
Please attach a sample pdf
> Wrong glyph (Persian) is used in extacted text instead of the original glyph
> (Persian) in PDF file
> ---------------------------------------------------------------------------------------------------
>
> Key: PDFBOX-1424
> URL: https://issues.apache.org/jira/browse/PDFBOX-1424
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.7.1
> Environment: Windows XP, Java 1.6.0
> Reporter: Ali Majdzadeh Kohbanani
>
> Hi
> I am very new to PDFBox and I am dealing with Persian PDF files. When I
> convert Persian PDF files using PDFBox-app, some Persian glyphs like م are
> displayed wrongly in the extracted text. For example, the word "هستم" in
> Persian is extracted as "هستن" and "من" in Persian is extracted as "هن".
> Also, the word "سلام" is extracted as "سالم". By the way, I have tested
> extracting text from a complete Persian PDF file with multiple pages; the
> result is disappointing. Actually, it is totally wrong. Please let me know if
> I should upload an example Persian PDF file.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira