wzx140 opened a new pull request, #8209: URL: https://github.com/apache/paimon/pull/8209
### Purpose `BinaryExternalSortBuffer#numRecords` was an `int`, so once the number of buffered records exceeds `Integer.MAX_VALUE` it silently overflows and can wrap to a negative value. When `numRecords` overflows to a negative value, `WriteBuffer#size()` returns a value `< 0`, so the guard `if (writeBuffer.size() > 0)` in `MergeTreeWriter#flushWriteBuffer` evaluates to `false`. The buffered records are silently discarded — resulting in data loss. This PR: - Widens `BinaryExternalSortBuffer#numRecords` to `long` to avoid overflow. - Adds `isEmpty()` to the `SortBuffer` and `WriteBuffer` interfaces and their implementations (`BinaryExternalSortBuffer`, `BinaryInMemorySortBuffer`, `SortBufferWriteBuffer`), and replaces emptiness checks of the form `size() > 0` with `!isEmpty()` in `Sorter`, `MergeTreeWriter`, and `SortOperator`. - Keeps `size()` returning `int` (it now throws when the count exceeds `Integer.MAX_VALUE` instead of silently truncating). It is **not** widened to `long` because `size()` is part of the `IndexedSortable` contract and is still consumed as an `int` by the in-memory sorting algorithms — e.g. `org.apache.paimon.sort.HeapSort`, which calls `sort(s, 0, s.size())`. ### Tests - Added `SortBufferWriteBufferOverflowTest`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
