[
https://issues.apache.org/jira/browse/PDFBOX-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227896#comment-14227896
]
John Hewson commented on PDFBOX-2509:
-------------------------------------
That code is a direct copy of the algorithm directly specified in the PDF spec,
so we won't take a patch which just shoves in extra hacks at the end. Your
mechanism is correct: we need to use the defendant font's CIDSystemInfo as a
fallback CMap. But we need to be populating cMap, cMapUCS2, and
isCMapPredefined correctly in readEncoding() and fetchCMapUCS2() instead of
messing with the standard algorithm in toUnicode().
The spec does mention this, when it says:
{quote}
if the font is composite and uses a predefined cmap (excluding Identity-H/V)
*or if its descendant font* uses Adobe-GB1/CNS1/Japan1/Korea1 then ...
{quote}
PDFBox was missing the part in bold.
> Korean Text wrong
> -----------------
>
> Key: PDFBOX-2509
> URL: https://issues.apache.org/jira/browse/PDFBOX-2509
> Project: PDFBox
> Issue Type: Bug
> Components: Rendering
> Affects Versions: 2.0.0
> Reporter: simon steiner
> Assignee: John Hewson
> Fix For: 2.1.0
>
> Attachments: japan.patch, pdfbox147.png, pdfbox238.png,
> pdfbox238_2.png, pdfbox328.png
>
>
> http://acroeng.adobe.com/Test_Files/fonts/asian%20font%20files/Korean/nonembedded/K4SystemFontsNotEmbeded218.PDF
> and
> http://acroeng.adobe.com/Test_Files/fonts/asian%20font%20files/Korean/nonembedded/KGulimcheNotembeded218.PDF
> and
> http://acroeng.adobe.com/Test_Files/fonts/asian%20font%20files/Korean/nonembedded/VariousKFontsNotembeded218.PDF
> and
> http://acroeng.adobe.com/Test_Files/fonts//EmbeddedCmap.pdf
> and
> http://acroeng.adobe.com/Test_Files/fonts/asian%20font%20files/Japanese/nonembedded/Jun101.pdf
> and
> http://acroeng.adobe.com/Test_Files/fonts/asian%20font%20files/Japanese/nonembedded/ACPTJ_WIN_MSGothic.DOC.pdf
> java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage
> K4SystemFontsNotEmbeded218.PDF
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)