[jira] [Created] (PDFBOX-1424) Wrong glyph (Persian) is used in extacted text instead of the original glyph (Persian) in PDF file

Ali Majdzadeh Kohbanani (JIRA) Tue, 09 Oct 2012 15:50:05 -0700

Ali Majdzadeh Kohbanani created PDFBOX-1424:
-----------------------------------------------


             Summary: Wrong glyph (Persian)  is used in extacted text instead 
of the original glyph (Persian) in PDF file
                 Key: PDFBOX-1424
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1424
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 1.7.1
         Environment: Windows XP, Java 1.6.0
            Reporter: Ali Majdzadeh Kohbanani


Hi
I am very new to PDFBox and I am dealing with Persian PDF files. When I convert 
Persian PDF files using PDFBox-app, some Persian glyphs like م are displayed 
wrongly in the extracted text. For example, the following "هستم" in Persian is 
extracted as "هستن" and "من" in Persian is extracted as "هن". Also, the work 
"سلام" is extracted as "سالم".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (PDFBOX-1424) Wrong glyph (Persian) is used in extacted text instead of the original glyph (Persian) in PDF file

Reply via email to