[
https://issues.apache.org/jira/browse/PDFBOX-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853252#action_12853252
]
Yigal Dayan commented on PDFBOX-420:
------------------------------------
Mapping 'Identity-H' to JIS also creates problems in some Arabic PDFs. These
PDFs use 'Identity-H' to encode two Arabic characters in one glyph. The patch
causes these Arabic characters to be sent to the CJK converter where they get
corrupted.
> Japanese Characters are garbled.
> --------------------------------
>
> Key: PDFBOX-420
> URL: https://issues.apache.org/jira/browse/PDFBOX-420
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 0.8.0-incubator
> Reporter: Takashi Komatsubara
> Priority: Critical
> Fix For: 1.1.0
>
> Attachments: supportJapanese-fontbox.patch, supportJapanese.patch,
> TestFilesForJapaneseGarbledIssue.zip, textextract._20090326_01.zip
>
>
> The extracted Japanese characters are completely garbled.
> This issue is very critical for Japanese users.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.