wzx140 opened a new pull request, #8209:
URL: https://github.com/apache/paimon/pull/8209

   ### Purpose
   
   `BinaryExternalSortBuffer#numRecords` was an `int`, so once the number of 
buffered records exceeds `Integer.MAX_VALUE` it silently overflows and can wrap 
to a negative value.
   
   When `numRecords` overflows to a negative value, `WriteBuffer#size()` 
returns a value `< 0`, so the guard `if (writeBuffer.size() > 0)` in 
`MergeTreeWriter#flushWriteBuffer` evaluates to `false`. The buffered records 
are silently discarded — resulting in data loss.
   
   This PR:
   - Widens `BinaryExternalSortBuffer#numRecords` to `long` to avoid overflow.
   - Adds `isEmpty()` to the `SortBuffer` and `WriteBuffer` interfaces and 
their implementations (`BinaryExternalSortBuffer`, `BinaryInMemorySortBuffer`, 
`SortBufferWriteBuffer`), and replaces emptiness checks of the form `size() > 
0` with `!isEmpty()` in `Sorter`, `MergeTreeWriter`, and `SortOperator`.
   - Keeps `size()` returning `int` (it now throws when the count exceeds 
`Integer.MAX_VALUE` instead of silently truncating). It is **not** widened to 
`long` because `size()` is part of the `IndexedSortable` contract and is still 
consumed as an `int` by the in-memory sorting algorithms — e.g. 
`org.apache.paimon.sort.HeapSort`, which calls `sort(s, 0, s.size())`.
   
   ### Tests
   
   - Added `SortBufferWriteBufferOverflowTest`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to