yugan95 opened a new pull request, #8158:
URL: https://github.com/apache/paimon/pull/8158

   ### Purpose
   
   Linked issue: #7620
   
   `HeapBytesVector.reserveBytes()` computes `newCapacity * 2` using plain 
`int` multiplication. When `newCapacity` exceeds ~1.07 billion bytes, this 
overflows `Integer.MAX_VALUE`, causing `NegativeArraySizeException` or silent 
data corruption during compaction reads of large records.
   
   #### Root Cause
   
   The callers `putByteArray` and `fill` also compute required capacity with 
`int` arithmetic (`bytesAppended + length` and `start.length * value.length`), 
which can overflow to a negative or smaller positive value before the 
`MAX_ARRAY_SIZE` guard sees them — silently bypassing the safety check.
   
   #### Changes
   
   - Promote callers to `long` arithmetic before entering `reserveBytes(long)`
   - Extract `calculateNewBytesCapacity(long)` as a package-visible static 
helper
   - Return exact required capacity when doubling would exceed `MAX_ARRAY_SIZE` 
(avoid unnecessary ~2GB allocation for ~1.1GB request)
   - Throw clear `RuntimeException` with capacity details when required 
capacity exceeds `MAX_ARRAY_SIZE`
   
   ### Tests
   
   `HeapBytesVectorReserveBytesTest` — 11 test cases covering:
   - Normal doubling for small values
   - Boundary at `MAX_ARRAY_SIZE >> 1` (still doubles)
   - Just above half-max (returns exact capacity, not `MAX_ARRAY_SIZE`)
   - Exactly `MAX_ARRAY_SIZE` (returns exact)
   - Above `MAX_ARRAY_SIZE` (throws)
   - Simulated int-overflow values via long (throws)
   - End-to-end `putByteArray` data correctness
   
   ### API and Format
   
   N/A
   
   ### Documentation
   
   N/A


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to