[jira] [Commented] (PDFBOX-2524) [PATCH] Two PDFont to create PDF documents in CJK and non-ISO-8859-1 languages

Keiji Suzuki (JIRA) Wed, 03 Dec 2014 19:06:49 -0800

    [ 
https://issues.apache.org/jira/browse/PDFBOX-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233866#comment-14233866
 ]


Keiji Suzuki commented on PDFBOX-2524:
--------------------------------------

Thank you for you review.  i reply to each one inline.

> Your new PDType0Font constructor shouldn't call readEncoding() or 
> fetchCMapUCS2(), as those methods are for reading a font from a PDF, not 
> embedding a new font.

I misunderstood and deleted these calls.

> getFontWidthsArray is parsing a string of space delimited integers, which you 
> created in PDCIDFontType2Embedder#setItemsForCIDFont. These two methods 
> should not be using strings to exchange data, what was your reason for doing 
> this?

This method was firstly made for PDType0CJKFont of which width data are set in 
property file in String. The reason this method is defined in PDType0Font is 
same as this. I refactored it with int[] parameter.

> CmapSubtable#getGlyphIdToCharacterCode() exposes private implementation 
> details from CmapSubtable, however I'd recommend using CID = GID rather than 
> your current approach, which would mean that you won't need this information 
> anyway.
> Using CID = GID would make getCIDToGID redundant, and generate smaller PDF 
> files because you can use the Identity cid2gid mapping.

I missed CIDToGIDMap of the COSName "Identity" and changed the code to use 
this. But at the next step we reconstruct the embedded font using TTFSubsetter, 
we have to use a stream of CIDToGIDMap because glyph id will be changed at 
reconstructing the embedded font.

> Please remove unused import statements
> Please do not import with .*

I did these.

BTW pdf documents made by this revised version of the font are note displayed 
with Adobe Reader but the built-in pdf reader of Chrome display it correctly. I 
noticed that removing BOM (0xFEFF) seems to fix it. I attached the two pdf 
documents with and without BOM each.

The attached patch is the one to the current trunk.


> [PATCH] Two PDFont to create PDF documents in CJK and non-ISO-8859-1 languages
> ------------------------------------------------------------------------------
>
>                 Key: PDFBOX-2524
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2524
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Writing
>    Affects Versions: 2.0.0
>            Reporter: Keiji Suzuki
>            Assignee: John Hewson
>         Attachments: Type0.java, Type0CJK.java, Type0Unicode.java, 
> cidtype0.diff, two-new-fonts.diff
>
>
> I made two PDFont classes for creating PDF documents in CJK and 
> non-ISO-8859-1 languages.
> One is PDType0CJKFont. This is for using CJK fonts included in the Asian font 
> package of Adobe Reader. This font doesn't require the target font at the 
> time of creating PDF documentary. This font uses UTF-16 as a text code and 
> supports surrogate pair characters.
> The other is PDType0UnicodeFont. This is for using TrueType Type0 Font which 
> can deal with any Unicode characters like a ArialUnicodeMS. Only the 
> characters which are used actually in the document are embedde. Realizing 
> this, you have to call the PDType0Unicode.reloadFont() method just before 
> closing PDPageContentStream. I think this specification is ugly, but I could 
> not thought of a suitable way to remove this spec. This font uses the 
> original glyph code of the embedded font as a text code and supports 
> surrogate pair characters too.
> Example programs using these two fonts are also attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PDFBOX-2524) [PATCH] Two PDFont to create PDF documents in CJK and non-ISO-8859-1 languages

Reply via email to