[jira] Commented: (PDFBOX-420) Japanese Characters are garbled.

Yigal Dayan (JIRA) Sun, 04 Apr 2010 06:40:52 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853252#action_12853252
 ]


Yigal Dayan commented on PDFBOX-420:
------------------------------------

Mapping 'Identity-H' to JIS also creates problems in some Arabic PDFs.  These 
PDFs use 'Identity-H' to encode two Arabic characters in one glyph. The patch 
causes these Arabic characters to be sent to the CJK converter where they get 
corrupted.


> Japanese Characters are garbled.
> --------------------------------
>
>                 Key: PDFBOX-420
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-420
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 0.8.0-incubator
>            Reporter: Takashi Komatsubara
>            Priority: Critical
>             Fix For: 1.1.0
>
>         Attachments: supportJapanese-fontbox.patch, supportJapanese.patch, 
> TestFilesForJapaneseGarbledIssue.zip, textextract._20090326_01.zip
>
>
> The extracted Japanese characters are completely garbled.
> This issue is very critical for Japanese users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PDFBOX-420) Japanese Characters are garbled.

Reply via email to