Liulietong opened a new pull request, #7221:
URL: https://github.com/apache/paimon/pull/7221

   ### Purpose
   
   Fix #7220
   
   When `sequence.field` is configured, `LookupMergeFunction.pickHighLevel()` 
should select the record with the highest sequence value instead of the lowest 
level number.
   
   ### Problem
   
   Previously, `pickHighLevel()` only compared level numbers:
   ```java
   if (highLevel == null || kv.level() < highLevel.level()) {
       highLevel = kv;
   }
   ```
   
   This could lead to incorrect results when out-of-order data arrives:
   - L1 has sequence=7 (older)
   - L2 has sequence=8 (newer)
   - L0 has sequence=6 (oldest, out-of-order arrival)
   
   The old logic would pick L1 (level 1 < level 2), but the correct behavior 
should pick L2 (sequence 8 > 7).
   
   ### Changes
   
   1. Add `sequenceComparator` field to `LookupMergeFunction`
   2. Modify `pickHighLevel()` to use sequence comparator when available
   3. Modify `getResult()` to sort records by sequence before adding to merge 
function
   4. Only set `sequenceComparator` when user-defined sequence field is 
configured, preserving original behavior when `sequence.field` is not set
   
   ### Tests
   
   Added 3 test cases:
   - `testSequenceFieldWithMultipleLevels` - verifies sequence.field is used 
correctly
   - `testWithoutSequenceFieldPreservesOriginalBehavior` - verifies backward 
compatibility  
   - `testSequenceFieldWithDescendingSortOrder` - verifies descending sort 
order works
   
   ### Impact
   
   Only affects `changelog-producer = lookup` with `sequence.field` configured. 
No impact on:
   - Normal queries (Batch/Streaming Scan)
   - Lookup Join
   - Tables without `sequence.field`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to