Can you create a bug in JBS (if there isn't one already). In general there are a lot of issues with java.io.PipedXXX and discussions periodically on whether to just deprecate these JDK 1.0/1.1 classes.

-Alan

On 11/03/2026 12:08, wenshao wrote:
Hi,
  I found an integer overflow bug in String.encodedLengthUTF8() where
  the LATIN1 code path uses an int accumulator without overflow check,
  while the UTF16 path correctly uses long.
  The bug:
  LATIN1 path (line 1499-1517):
  private static int encodedLengthUTF8(byte coder, byte[] val) {
  if (coder == UTF16) {
  return encodedLengthUTF8_UTF16(val, null); // ← uses long dp, has overflow 
check
  }
  int positives = StringCoding.countPositives(val, 0, val.length);
  if (positives == val.length) {
  return positives;
  }
  int dp = positives; // ← int, no overflow protection
  for (int i = dp; i < val.length; i++) {
  if (val[i] < 0) dp += 2;
  else dp++;
  }
  return dp; // ← may have overflowed
  }
  UTF16 path (encodedLengthUTF8_UTF16, line 1596-1642):
  long dp = 0L; // ← long
  ...
  if (dp > (long)Integer.MAX_VALUE) { // ← overflow check
  throw new OutOfMemoryError("Required length exceeds implementation limit");
  }
  return (int) dp;
  When a LATIN1 string contains more than Integer.MAX_VALUE / 2
  non-ASCII bytes (~1 GB of 0x80-0xFF), each byte encodes to 2 UTF-8
  bytes, so dp exceeds Integer.MAX_VALUE and wraps to negative.
  This causes NegativeArraySizeException in downstream buffer
  allocation, instead of OutOfMemoryError.
  Analytical proof:
  length = Integer.MAX_VALUE / 2 + 1 = 1,073,741,824
  correct result (long) = 2 * 1,073,741,824 = 2,147,483,648
  overflowed result (int) = -2,147,483,648 // silent overflow!
  The fix:
  Align LATIN1 path with UTF16 path:
  long dp = positives;
  for (int i = positives; i < val.length; i++) {
  if (val[i] < 0) dp += 2;
  else dp++;
  }
  if (dp > (long)Integer.MAX_VALUE) {
  throw new OutOfMemoryError("Required length exceeds implementation limit");
  }
  return (int) dp;
  Note: for (int i = dp; ...) changed to for (int i = positives; ...)
  to avoid implicit long→int narrowing after dp changed to long.
  This is semantically equivalent since dp == positives at loop entry.
  No performance impact: long arithmetic has identical cost on 64-bit
  platforms, the overflow check runs once outside the loop, and pure
  ASCII strings exit early at line 1504 before reaching this code.
  The patch includes a jtreg test with small-string correctness
  verification and a large-string overflow test (requires 3GB heap).
  Webrev: https://github.com/wenshao/jdk/tree/fix/string-encodedLengthUTF8-overflow 
<https://github.com/wenshao/jdk/tree/fix/string-encodedLengthUTF8-overflow >
  Thanks,
  Shaojin Wen

Reply via email to