yugan95 opened a new pull request, #8160:
URL: https://github.com/apache/paimon/pull/8160

   ### Purpose
   
   Linked issue: #7620
   
   During external merge sort, each merge channel holds a `BinaryRow` reuse 
instance via `BinaryRowSerializer.deserialize(reuse, source)`. When a large 
record is deserialized, the backing `MemorySegment` grows to fit it but is 
never shrunk for subsequent small records. With `max-num-file-handles` (default 
128) channels each retaining a 100MB+ buffer, memory usage explodes into OOM.
   
   #### Changes
   
   - **`BinaryRowSerializer`**: add shrink logic with hysteresis in 
`deserialize(BinaryRow reuse, DataInputView source)` — reallocate only when the 
existing buffer exceeds 4MB **and** the current record is smaller than 4MB
     - Sustained large records (5–10MB): buffer retained, no thrashing
     - Large record → small records: buffer shrunk, OOM protection
     - Normal small records (< 4MB): no behavior change
   
   ### Tests
   
   `BinaryRowSerializerShrinkTest` — 4 test cases covering:
   - Oversized buffer shrunk when transitioning to small records
   - Small buffer reused without shrinking
   - Buffer grows when a larger record arrives
   - Consecutive large records retain the buffer (hysteresis, no thrashing)
   
   ### API and Format
   
   N/A
   
   ### Documentation
   
   N/A


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to