[
https://issues.apache.org/jira/browse/PDFBOX-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18073861#comment-18073861
]
Stefan Ziegler commented on PDFBOX-6193:
----------------------------------------
Simple test code:
static final String BASE25 = "BCDEFGHIJKLMNOPQRSTUVWXYZ";/** Verbatim copy from
PDFBox 3.0.7 TrueTypeEmbedder.getTag(). */static String getTag(Map<Integer,
Integer> gidToCid) { long num = gidToCid.hashCode();
StringBuilder sb = new StringBuilder(); do { long div = num
/ 25; int mod = (int) (num % 25);
sb.append(BASE25.charAt(mod));
num = div;
} while (num != 0 && sb.length() < 6); while (sb.length() < 6) {
sb.insert(0, 'A');
}
sb.append('+'); return sb.toString();
}public static void main(String[] args) {
Map<Integer, Integer> m = new HashMap<>();
m.put(1, Integer.MIN_VALUE); System.out.println("Map.hashCode() = "
+ m.hashCode()); System.out.println("getTag() = " + getTag(m));
}
> TrueTypeEmbedder.getTag() throws StringIndexOutOfBoundsException for negative
> Map.hashCode()
> --------------------------------------------------------------------------------------------
>
> Key: PDFBOX-6193
> URL: https://issues.apache.org/jira/browse/PDFBOX-6193
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 3.0.7 PDFBox
> Reporter: Stefan Ziegler
> Priority: Major
>
> {{TrueTypeEmbedder.getTag(Map<Integer, Integer> gidToCid)}} produces a
> six-character subset prefix from the hash code of the GID map. When the hash
> code is negative, the method crashes with {{StringIndexOutOfBoundsException}}
> instead of returning a valid tag.
> The relevant code (lines 365–388 of {{TrueTypeEmbedder.java}} in 3.0.7):
>
> {code:java}
> public String getTag(Map<Integer, Integer> gidToCid)
> {
> // deterministic
> long num = gidToCid.hashCode();
> // base25 encode
> StringBuilder sb = new StringBuilder();
> do
> {
> long div = num / 25;
> int mod = (int)(num % 25);
> sb.append(BASE25.charAt(mod)); // <-- crashes when mod is negative
> num = div;
> } while (num != 0 && sb.length() < 6);
> ...
> }
> {code}
>
> *Root cause*
> In Java, the {{%}} operator returns a result whose sign matches the sign of
> the left operand. Therefore, when {{num}} is negative, {{num % 25}} is in the
> range {{{}[-24, 0]{}}}, and {{BASE25.charAt(negativeIndex)}} immediately
> throws {{{}StringIndexOutOfBoundsException{}}}.
> {{Map.hashCode()}} is the sum of entry hash codes per the {{Map.hashCode()}}
> contract; for sufficiently large or value-diverse maps this sum frequently
> overflows or otherwise becomes negative. In practice this means the bug is
> triggered probabilistically depending on the specific glyphs being subset.
>
> The crash also occurs naturally when subsetting larger fonts whose resulting
> {{gidToCid}} map happens to hash to a negative value — there is roughly a 50%
> chance per subset, depending on the glyphs used in the document.
>
> *Suggested fix*
> Use {{{}Math.floorMod{}}}, which is defined to return a non-negative result
> for a positive divisor:
> {code:java}
> public String getTag(Map<Integer, Integer> gidToCid)
> {
> long num = gidToCid.hashCode(); StringBuilder sb = new StringBuilder();
> do
> {
> int mod = (int) Math.floorMod(num, 25L);
> sb.append(BASE25.charAt(mod));
> num = Math.floorDiv(num, 25L);
> } while (num != 0 && sb.length() < 6); while (sb.length() < 6)
> {
> sb.insert(0, 'A');
> } sb.append('+');
> return sb.toString();
> }{code}
> Note that {{num / 25}} must also be replaced with {{Math.floorDiv(num, 25)}}
> to keep the loop's termination condition ({{{}num != 0{}}}) consistent with
> the new {{mod}} semantics. Without that change, the loop can spin between -1
> and 1 indefinitely (capped at 6 iterations by the length guard, but producing
> wrong tags).
> Alternatively, {{Math.abs(num) % 25}} works for {{{}int{}}}-derived hash
> codes (since {{Math.abs}} of a widened {{int}} is always representable as
> {{{}long{}}}), but {{floorMod}} is the more idiomatic and self-documenting
> choice.
>
> *Impact*
> Any caller of {{TrueTypeEmbedder.subset()}} — i.e., the standard
> font-embedding path used by {{PDType0Font.load()}} and
> {{PDTrueTypeFont.load()}} when subsetting is enabled (the default) — can hit
> this crash. The likelihood scales with the number and identity of glyphs in
> the subset, so it manifests intermittently and is hard to attribute without
> reading the stack trace carefully.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]