[ 
https://issues.apache.org/jira/browse/PDFBOX-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244792#comment-14244792
 ] 

John Hewson commented on PDFBOX-2524:
-------------------------------------

Ok, I've applied a refactored version of your patch. The embedding code is now 
in PDCIDFontType2Embedder, and your example program has been added as 
EmbeddedFonts in org.apache.pdfbox.examples. Thanks!

However, the text doesn't work with copy & paste, because the Identity 
CID2GIDMap implies that CID = Unicode, but we're using CID = GID. This means 
that Acrobat tries to read the text for copy & paste as Unicode = GID, which is 
obviously nonsense.

The solution is to generate a ToUnicode CMap for the Type0 font, I have some 
code to do this which I'll commit soon.

> [PATCH] Two PDFont to create PDF documents in CJK and non-ISO-8859-1 languages
> ------------------------------------------------------------------------------
>
>                 Key: PDFBOX-2524
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2524
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Writing
>    Affects Versions: 2.0.0
>            Reporter: Keiji Suzuki
>            Assignee: John Hewson
>         Attachments: Type0.java, Type0CJK.java, Type0Unicode.java, 
> cidtype0.diff, cidtype2.diff, two-new-fonts.diff, type0bom.pdf, type0nobom.pdf
>
>
> I made two PDFont classes for creating PDF documents in CJK and 
> non-ISO-8859-1 languages.
> One is PDType0CJKFont. This is for using CJK fonts included in the Asian font 
> package of Adobe Reader. This font doesn't require the target font at the 
> time of creating PDF documentary. This font uses UTF-16 as a text code and 
> supports surrogate pair characters.
> The other is PDType0UnicodeFont. This is for using TrueType Type0 Font which 
> can deal with any Unicode characters like a ArialUnicodeMS. Only the 
> characters which are used actually in the document are embedde. Realizing 
> this, you have to call the PDType0Unicode.reloadFont() method just before 
> closing PDPageContentStream. I think this specification is ugly, but I could 
> not thought of a suitable way to remove this spec. This font uses the 
> original glyph code of the embedded font as a text code and supports 
> surrogate pair characters too.
> Example programs using these two fonts are also attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to