On Tue, 17 Mar 2026 18:44:37 GMT, Raffaello Giulietti <[email protected]> 
wrote:

>> The encodedLengthUTF8() method uses an int accumulator (dp) for the LATIN1 
>> code path, while the UTF16 path (encodedLengthUTF8_UTF16) correctly uses a 
>> long accumulator with an overflow check. When a LATIN1 string contains more 
>> than Integer.MAX_VALUE/2 non-ASCII bytes, the int dp overflows, potentially 
>> causing NegativeArraySizeException in downstream buffer allocation.
>> 
>> Fix: change dp from int to long and add the same overflow check used in the 
>> UTF16 path.
>
> src/java.base/share/classes/java/lang/String.java line 1519:
> 
>> 1517:             throw new OutOfMemoryError("Required length exceeds 
>> implementation limit");
>> 1518:         }
>> 1519:         return (int) dp;
> 
> I think you can leave the code as it currently is and throw when `dp < 0`.
> But this variant only works when `dp` is incremented by at most 2 at each 
> iteration, like here.
> Your variant with `long` is more robust.

Thank you for the suggestion. You're right that checking `dp < 0` would work 
here since we increment by at most 2 per iteration. However, I prefer to keep 
the `long` approach because:

1. It's more explicit and robust - the overflow check is clear rather than 
implicit
2. It matches the existing UTF16 path pattern (encodedLengthUTF8_UTF16)
3. It doesn't rely on the assumption that dp always increments by ≤2, making it 
more maintainable if the code evolves

The performance difference is negligible, so I believe the clarity and 
robustness are worth the slight verbosity.

> test/jdk/java/lang/String/EncodedLengthUTF8Overflow.java line 111:
> 
>> 109:         }
>> 110:         bigArray = null; // allow GC
>> 111: 
> 
> Have you considered simplifying the above code with just `bigString = 
> String.valueOf(\u00ff).repeat(length)`?

Great suggestion! I've applied this simplification in the latest commit 
(f0c2830e1c1). The `String.repeat()` approach is indeed much cleaner - it 
eliminates the manual byte array allocation, `Arrays.fill()`, and the need for 
explicit GC hints.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/30189#discussion_r2951446437
PR Review Comment: https://git.openjdk.org/jdk/pull/30189#discussion_r2951446582

Reply via email to