[
https://issues.apache.org/jira/browse/PDFBOX-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243858#comment-14243858
]
John Hewson commented on PDFBOX-2524:
-------------------------------------
Thanks for your updated patch. I looked into the problem with the BOM, and it
is due to a mistake in how PDFBox is handling strings which are written to the
content stream. We shouldn't have been using COSString, as that is only for
strings which appear in PDF dictionaries, which use a UTF-16BE encoding. I've
fixed this in PDFBOX-1242, with quite a large change [
https://svn.apache.org/r1644828 ].
I'm going to refactor your patch to take advantage of these changes.
> [PATCH] Two PDFont to create PDF documents in CJK and non-ISO-8859-1 languages
> ------------------------------------------------------------------------------
>
> Key: PDFBOX-2524
> URL: https://issues.apache.org/jira/browse/PDFBOX-2524
> Project: PDFBox
> Issue Type: Improvement
> Components: Writing
> Affects Versions: 2.0.0
> Reporter: Keiji Suzuki
> Assignee: John Hewson
> Attachments: Type0.java, Type0CJK.java, Type0Unicode.java,
> cidtype0.diff, cidtype2.diff, two-new-fonts.diff, type0bom.pdf, type0nobom.pdf
>
>
> I made two PDFont classes for creating PDF documents in CJK and
> non-ISO-8859-1 languages.
> One is PDType0CJKFont. This is for using CJK fonts included in the Asian font
> package of Adobe Reader. This font doesn't require the target font at the
> time of creating PDF documentary. This font uses UTF-16 as a text code and
> supports surrogate pair characters.
> The other is PDType0UnicodeFont. This is for using TrueType Type0 Font which
> can deal with any Unicode characters like a ArialUnicodeMS. Only the
> characters which are used actually in the document are embedde. Realizing
> this, you have to call the PDType0Unicode.reloadFont() method just before
> closing PDPageContentStream. I think this specification is ugly, but I could
> not thought of a suitable way to remove this spec. This font uses the
> original glyph code of the embedded font as a text code and supports
> surrogate pair characters too.
> Example programs using these two fonts are also attached.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)