yugan95 opened a new pull request, #8160:
URL: https://github.com/apache/paimon/pull/8160
### Purpose
Linked issue: #7620
During external merge sort, each merge channel holds a `BinaryRow` reuse
instance via `BinaryRowSerializer.deserialize(reuse, source)`. When a large
record is deserialized, the backing `MemorySegment` grows to fit it but is
never shrunk for subsequent small records. With `max-num-file-handles` (default
128) channels each retaining a 100MB+ buffer, memory usage explodes into OOM.
#### Changes
- **`BinaryRowSerializer`**: add shrink logic with hysteresis in
`deserialize(BinaryRow reuse, DataInputView source)` — reallocate only when the
existing buffer exceeds 4MB **and** the current record is smaller than 4MB
- Sustained large records (5–10MB): buffer retained, no thrashing
- Large record → small records: buffer shrunk, OOM protection
- Normal small records (< 4MB): no behavior change
### Tests
`BinaryRowSerializerShrinkTest` — 4 test cases covering:
- Oversized buffer shrunk when transitioning to small records
- Small buffer reused without shrinking
- Buffer grows when a larger record arrives
- Consecutive large records retain the buffer (hysteresis, no thrashing)
### API and Format
N/A
### Documentation
N/A
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]