[jira] [Commented] (PDFBOX-3884) GlyphList registers "wrong" Adobe name for "U+02DC SMALL TILDE"

JIRA Fri, 28 Jul 2017 16:17:28 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105849#comment-16105849
 ]


Matías Giovannini commented on PDFBOX-3884:
-------------------------------------------

This problem occurs in seven instances in total. In every case, the 
corresponding {{Encoding}} classes list the _latter_ name in the Glyph List, 
but the {{GlyphList}} class retains the former, leading to round-trip encoding 
exceptions. As a workaround, I patch the {{Encoding}} instances using 
reflection:

{code:java}
final Method method = Encoding.class.getDeclaredMethod("overwrite", int.class, 
String.class);
method.setAccessible(true);
method.invoke(WinAnsiEncoding.INSTANCE, 0230, "ilde"); // tilde
method.invoke(WinAnsiEncoding.INSTANCE, 0267, "middot"); // periodcentered
method.invoke(SymbolEncoding.INSTANCE, 0042, "forall"); // universal
method.invoke(SymbolEncoding.INSTANCE, 0100, "approximatelyequal"); // congruent
method.invoke(SymbolEncoding.INSTANCE, 0127, "Ohm"); // Omega
method.invoke(SymbolEncoding.INSTANCE, 0363, "integraltop"); // integraltp
method.invoke(SymbolEncoding.INSTANCE, 0365, "integralbottom"); // integralbt
{code}

I think the intent of the glyph list is to record mappings in historical order, 
so that amendments come later. The reading code should keep the last mapping 
seen so that it takes it as the canonical one.

In other words, remove the conditional guarding {{unicodeToName.put(string, 
name);}}:

{code:java}
-                   // reverse mapping
+                   // reverse mapping (keep the latest name as canonical)
-                   if (!unicodeToName.containsKey(string))
-                   {
                        unicodeToName.put(string, name);
-                   }
{code}


> GlyphList registers "wrong" Adobe name for "U+02DC SMALL TILDE"
> ---------------------------------------------------------------
>
>                 Key: PDFBOX-3884
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3884
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 2.0.6
>            Reporter: Matías Giovannini
>            Priority: Minor
>         Attachments: PDFEncodingError.java
>
>
> The Adobe Glyph List contains both "ilde;02DC" (line 2304) and "tilde;02DC" 
> (line 3826), so the Unicode conversion of ExtendedRoman 0x98 (152) "small 
> tilde" fails:
> java.lang.IllegalArgumentException: U+02DC ('ilde') is not available in this 
> font Times-Roman encoding: WinAnsiEncoding
>       at 
> org.apache.pdfbox.pdmodel.font.PDType1Font.encode(PDType1Font.java:425)
>       at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:323)
>       at 
> org.apache.pdfbox.pdmodel.PDPageContentStream.showText(PDPageContentStream.java:414)
>       at pdftest.PDFEncodingError.main(PDFEncodingError.java:18)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-3884) GlyphList registers "wrong" Adobe name for "U+02DC SMALL TILDE"

Reply via email to