Liulietong opened a new pull request, #7221:
URL: https://github.com/apache/paimon/pull/7221
### Purpose
Fix #7220
When `sequence.field` is configured, `LookupMergeFunction.pickHighLevel()`
should select the record with the highest sequence value instead of the lowest
level number.
### Problem
Previously, `pickHighLevel()` only compared level numbers:
```java
if (highLevel == null || kv.level() < highLevel.level()) {
highLevel = kv;
}
```
This could lead to incorrect results when out-of-order data arrives:
- L1 has sequence=7 (older)
- L2 has sequence=8 (newer)
- L0 has sequence=6 (oldest, out-of-order arrival)
The old logic would pick L1 (level 1 < level 2), but the correct behavior
should pick L2 (sequence 8 > 7).
### Changes
1. Add `sequenceComparator` field to `LookupMergeFunction`
2. Modify `pickHighLevel()` to use sequence comparator when available
3. Modify `getResult()` to sort records by sequence before adding to merge
function
4. Only set `sequenceComparator` when user-defined sequence field is
configured, preserving original behavior when `sequence.field` is not set
### Tests
Added 3 test cases:
- `testSequenceFieldWithMultipleLevels` - verifies sequence.field is used
correctly
- `testWithoutSequenceFieldPreservesOriginalBehavior` - verifies backward
compatibility
- `testSequenceFieldWithDescendingSortOrder` - verifies descending sort
order works
### Impact
Only affects `changelog-producer = lookup` with `sequence.field` configured.
No impact on:
- Normal queries (Batch/Streaming Scan)
- Lookup Join
- Tables without `sequence.field`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]