[
https://issues.apache.org/jira/browse/PDFBOX-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233866#comment-14233866
]
Keiji Suzuki commented on PDFBOX-2524:
--------------------------------------
Thank you for you review. i reply to each one inline.
> Your new PDType0Font constructor shouldn't call readEncoding() or
> fetchCMapUCS2(), as those methods are for reading a font from a PDF, not
> embedding a new font.
I misunderstood and deleted these calls.
> getFontWidthsArray is parsing a string of space delimited integers, which you
> created in PDCIDFontType2Embedder#setItemsForCIDFont. These two methods
> should not be using strings to exchange data, what was your reason for doing
> this?
This method was firstly made for PDType0CJKFont of which width data are set in
property file in String. The reason this method is defined in PDType0Font is
same as this. I refactored it with int[] parameter.
> CmapSubtable#getGlyphIdToCharacterCode() exposes private implementation
> details from CmapSubtable, however I'd recommend using CID = GID rather than
> your current approach, which would mean that you won't need this information
> anyway.
> Using CID = GID would make getCIDToGID redundant, and generate smaller PDF
> files because you can use the Identity cid2gid mapping.
I missed CIDToGIDMap of the COSName "Identity" and changed the code to use
this. But at the next step we reconstruct the embedded font using TTFSubsetter,
we have to use a stream of CIDToGIDMap because glyph id will be changed at
reconstructing the embedded font.
> Please remove unused import statements
> Please do not import with .*
I did these.
BTW pdf documents made by this revised version of the font are note displayed
with Adobe Reader but the built-in pdf reader of Chrome display it correctly. I
noticed that removing BOM (0xFEFF) seems to fix it. I attached the two pdf
documents with and without BOM each.
The attached patch is the one to the current trunk.
> [PATCH] Two PDFont to create PDF documents in CJK and non-ISO-8859-1 languages
> ------------------------------------------------------------------------------
>
> Key: PDFBOX-2524
> URL: https://issues.apache.org/jira/browse/PDFBOX-2524
> Project: PDFBox
> Issue Type: Improvement
> Components: Writing
> Affects Versions: 2.0.0
> Reporter: Keiji Suzuki
> Assignee: John Hewson
> Attachments: Type0.java, Type0CJK.java, Type0Unicode.java,
> cidtype0.diff, two-new-fonts.diff
>
>
> I made two PDFont classes for creating PDF documents in CJK and
> non-ISO-8859-1 languages.
> One is PDType0CJKFont. This is for using CJK fonts included in the Asian font
> package of Adobe Reader. This font doesn't require the target font at the
> time of creating PDF documentary. This font uses UTF-16 as a text code and
> supports surrogate pair characters.
> The other is PDType0UnicodeFont. This is for using TrueType Type0 Font which
> can deal with any Unicode characters like a ArialUnicodeMS. Only the
> characters which are used actually in the document are embedde. Realizing
> this, you have to call the PDType0Unicode.reloadFont() method just before
> closing PDPageContentStream. I think this specification is ugly, but I could
> not thought of a suitable way to remove this spec. This font uses the
> original glyph code of the embedded font as a text code and supports
> surrogate pair characters too.
> Example programs using these two fonts are also attached.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)