[jira] [Updated] (PDFBOX-5328) Failing to get multiple encodings from cmap table

Tilman Hausherr (Jira) Sat, 20 Nov 2021 04:14:06 -0800


     [ 
https://issues.apache.org/jira/browse/PDFBOX-5328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tilman Hausherr updated PDFBOX-5328:
------------------------------------
    Fix Version/s:     (was: 1.8.17)

> Failing to get multiple encodings from cmap table
> -------------------------------------------------
>
>                 Key: PDFBOX-5328
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5328
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 1.8.16, 2.0.24
>            Reporter: Tilman Hausherr
>            Assignee: Tilman Hausherr
>            Priority: Minor
>             Fix For: 2.0.25, 3.0.0 PDFBox
>
>         Attachments: NotoSansSC-Regular.otf
>
>
> As reported by Ty Lewis in the users mailing list, see 
> [here|https://mail-archives.apache.org/mod_mbox/pdfbox-users/202111.mbox/%3CCAPRgSAOG1a9kw4wSmArH0uG-N5xd9_kPq7ju4U%3DSv9H9CQZmcQ%40mail.gmail.com%3E]
> {noformat}
> Unicode encodings for GID 8712: List(U+f967)
> Unicode encodings for GID 8712 from table (platformId = 0 encodingId = 3):
> List(U+4e0d, U+f967)
> Unicode encodings for GID 8712 from table (platformId = 0 encodingId = 4):
> List(U+f967)
> {noformat}
> I made some java code to reproduce this:
> {code}
> File fontFile = new File("NotoSansSC-Regular.otf");
> OTFParser otfParser = new OTFParser(false);
> OpenTypeFont otf = otfParser.parse(fontFile);
> CmapLookup unicodeCmapLookup = otf.getUnicodeCmapLookup();
> List<Integer> charCodes = unicodeCmapLookup.getCharCodes(8712);
> System.out.println(charCodes);
> CmapTable cmapTable = otf.getCmap();
> CmapSubtable unicodeFullCmapTable = 
> cmapTable.getSubtable(CmapTable.PLATFORM_UNICODE, 
> CmapTable.ENCODING_UNICODE_2_0_FULL);
> CmapSubtable unicodeBmpCmapTable = 
> cmapTable.getSubtable(CmapTable.PLATFORM_UNICODE, 
> CmapTable.ENCODING_UNICODE_2_0_BMP);
> List<Integer> unicodeBmpCharCodes = unicodeBmpCmapTable.getCharCodes(8712);
> List<Integer> unicodeFullCharCodes = unicodeFullCmapTable.getCharCodes(8712);
> System.out.println(unicodeBmpCharCodes);
> System.out.println(unicodeFullCharCodes);
> {code}
> A look in the tables with DTL OTMaster 3.7 light shows there are indeed two 
> entries. A search for them (in hex) shows the characters 不 and 不.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (PDFBOX-5328) Failing to get multiple encodings from cmap table

Reply via email to