JingsongLi commented on PR #8209:
URL: https://github.com/apache/paimon/pull/8209#issuecomment-4678779836
Thanks for the fix here. I think the direction makes sense, but I found a
few follow-up issues before this is safe to merge.
1.
`paimon-flink-common/src/main/java/org/apache/paimon/flink/sink/LocalMergeOperator.java:204`
`flushBuffer()` still uses `if (merger.size() == 0)`.
`SortBufferLocalMerger.size()` delegates to `SortBufferWriteBuffer.size()`, and
this PR changes that path to throw once the record count exceeds
`Integer.MAX_VALUE`. So the same large-buffer scenario will still fail in the
local-merge path during flush/close. I think this path also needs a
non-`size()` emptiness check, or `LocalMerger` needs an `isEmpty()` API.
2.
`paimon-core/src/main/java/org/apache/paimon/crosspartition/GlobalIndexAssigner.java:210`
`bootstrapKeys` is also a `BinaryExternalSortBuffer`, but this code still
checks `bootstrapKeys.size() > 0`. After this PR, a bootstrap with more than
`Integer.MAX_VALUE` keys will now fail with a `RuntimeException` here instead
of silently misbehaving. In other words, the overflow fix is only applied to
`MergeTreeWriter` / `Sorter` / `SortOperator`, but the same issue is still
present in the global-index bootstrap path.
3.
`paimon-core/src/main/java/org/apache/paimon/sort/BinaryExternalSortBuffer.java:125`
The new guard uses `numRecords >= Integer.MAX_VALUE`, but
`Integer.MAX_VALUE` itself is still a valid `int` return value. This means
`size()` now throws one step too early at exactly `2147483647`, which is also
inconsistent with the exception message (`exceeds Integer.MAX_VALUE`). I think
this should be `>`, and it would be good to add an exact-boundary test for
`Integer.MAX_VALUE` as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]