[ 
https://issues.apache.org/jira/browse/PDFBOX-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18073861#comment-18073861
 ] 

Stefan Ziegler commented on PDFBOX-6193:
----------------------------------------

Simple test code:


static final String BASE25 = "BCDEFGHIJKLMNOPQRSTUVWXYZ";/** Verbatim copy from 
PDFBox 3.0.7 TrueTypeEmbedder.getTag(). */static String getTag(Map<Integer, 
Integer> gidToCid) {        long num = gidToCid.hashCode();
        StringBuilder sb = new StringBuilder(); do {            long div = num 
/ 25;            int mod = (int) (num % 25);
                sb.append(BASE25.charAt(mod));
                num = div;
        } while (num != 0 && sb.length() < 6);  while (sb.length() < 6) {
                sb.insert(0, 'A');
        }
        sb.append('+'); return sb.toString();
}public static void main(String[] args) {
        Map<Integer, Integer> m = new HashMap<>();
        m.put(1, Integer.MIN_VALUE);    System.out.println("Map.hashCode() = " 
+ m.hashCode()); System.out.println("getTag()       = " + getTag(m));
}

> TrueTypeEmbedder.getTag() throws StringIndexOutOfBoundsException for negative 
> Map.hashCode()
> --------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-6193
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-6193
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 3.0.7 PDFBox
>            Reporter: Stefan Ziegler
>            Priority: Major
>
> {{TrueTypeEmbedder.getTag(Map<Integer, Integer> gidToCid)}} produces a 
> six-character subset prefix from the hash code of the GID map. When the hash 
> code is negative, the method crashes with {{StringIndexOutOfBoundsException}} 
> instead of returning a valid tag.
> The relevant code (lines 365–388 of {{TrueTypeEmbedder.java}} in 3.0.7):
>  
> {code:java}
> public String getTag(Map<Integer, Integer> gidToCid)
> {
>     // deterministic
>     long num = gidToCid.hashCode();
>     // base25 encode
>     StringBuilder sb = new StringBuilder();
>     do
>     {
>         long div = num / 25;
>         int mod = (int)(num % 25);
>         sb.append(BASE25.charAt(mod));   // <-- crashes when mod is negative
>         num = div;
>     } while (num != 0 && sb.length() < 6);
>     ...
> }
> {code}
>  
> *Root cause*
> In Java, the {{%}} operator returns a result whose sign matches the sign of 
> the left operand. Therefore, when {{num}} is negative, {{num % 25}} is in the 
> range {{{}[-24, 0]{}}}, and {{BASE25.charAt(negativeIndex)}} immediately 
> throws {{{}StringIndexOutOfBoundsException{}}}.
> {{Map.hashCode()}} is the sum of entry hash codes per the {{Map.hashCode()}} 
> contract; for sufficiently large or value-diverse maps this sum frequently 
> overflows or otherwise becomes negative. In practice this means the bug is 
> triggered probabilistically depending on the specific glyphs being subset.
>  
> The crash also occurs naturally when subsetting larger fonts whose resulting 
> {{gidToCid}} map happens to hash to a negative value — there is roughly a 50% 
> chance per subset, depending on the glyphs used in the document.
>  
> *Suggested fix*
> Use {{{}Math.floorMod{}}}, which is defined to return a non-negative result 
> for a positive divisor:
> {code:java}
> public String getTag(Map<Integer, Integer> gidToCid)
> {
>     long num = gidToCid.hashCode();    StringBuilder sb = new StringBuilder();
>     do
>     {
>         int mod = (int) Math.floorMod(num, 25L);
>         sb.append(BASE25.charAt(mod));
>         num = Math.floorDiv(num, 25L);
>     } while (num != 0 && sb.length() < 6);    while (sb.length() < 6)
>     {
>         sb.insert(0, 'A');
>     }    sb.append('+');
>     return sb.toString();
> }{code}
> Note that {{num / 25}} must also be replaced with {{Math.floorDiv(num, 25)}} 
> to keep the loop's termination condition ({{{}num != 0{}}}) consistent with 
> the new {{mod}} semantics. Without that change, the loop can spin between -1 
> and 1 indefinitely (capped at 6 iterations by the length guard, but producing 
> wrong tags).
> Alternatively, {{Math.abs(num) % 25}} works for {{{}int{}}}-derived hash 
> codes (since {{Math.abs}} of a widened {{int}} is always representable as 
> {{{}long{}}}), but {{floorMod}} is the more idiomatic and self-documenting 
> choice.
>  
> *Impact*
> Any caller of {{TrueTypeEmbedder.subset()}} — i.e., the standard 
> font-embedding path used by {{PDType0Font.load()}} and 
> {{PDTrueTypeFont.load()}} when subsetting is enabled (the default) — can hit 
> this crash. The likelihood scales with the number and identity of glyphs in 
> the subset, so it manifests intermittently and is hard to attribute without 
> reading the stack trace carefully.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to