yugan95 opened a new pull request, #8158: URL: https://github.com/apache/paimon/pull/8158
### Purpose Linked issue: #7620 `HeapBytesVector.reserveBytes()` computes `newCapacity * 2` using plain `int` multiplication. When `newCapacity` exceeds ~1.07 billion bytes, this overflows `Integer.MAX_VALUE`, causing `NegativeArraySizeException` or silent data corruption during compaction reads of large records. #### Root Cause The callers `putByteArray` and `fill` also compute required capacity with `int` arithmetic (`bytesAppended + length` and `start.length * value.length`), which can overflow to a negative or smaller positive value before the `MAX_ARRAY_SIZE` guard sees them — silently bypassing the safety check. #### Changes - Promote callers to `long` arithmetic before entering `reserveBytes(long)` - Extract `calculateNewBytesCapacity(long)` as a package-visible static helper - Return exact required capacity when doubling would exceed `MAX_ARRAY_SIZE` (avoid unnecessary ~2GB allocation for ~1.1GB request) - Throw clear `RuntimeException` with capacity details when required capacity exceeds `MAX_ARRAY_SIZE` ### Tests `HeapBytesVectorReserveBytesTest` — 11 test cases covering: - Normal doubling for small values - Boundary at `MAX_ARRAY_SIZE >> 1` (still doubles) - Just above half-max (returns exact capacity, not `MAX_ARRAY_SIZE`) - Exactly `MAX_ARRAY_SIZE` (returns exact) - Above `MAX_ARRAY_SIZE` (throws) - Simulated int-overflow values via long (throws) - End-to-end `putByteArray` data correctness ### API and Format N/A ### Documentation N/A -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
