Liulietong opened a new issue, #7220: URL: https://github.com/apache/paimon/issues/7220
### Search before asking - [x] I searched in the [issues](https://github.com/apache/paimon/issues) and found no similar issues. ### Paimon version master (latest) ### Compute Engine None ### Minimal reproduce step When using `changelog-producer = lookup` with `sequence.field` configured, `LookupMergeFunction.pickHighLevel()` may select the wrong "old" record when out-of-order data arrives. **Configuration:** ```sql CREATE TABLE test ( id INT PRIMARY KEY NOT ENFORCED, value INT, update_time BIGINT ) WITH ( 'changelog-producer' = 'lookup', 'sequence.field' = 'update_time' ); ``` **Scenario:** ``` Initial state after compaction: L1: (id=1, value=100, update_time=7) L2: (id=1, value=200, update_time=8) ← Actually newer! New out-of-order data arrives at L0: L0: (id=1, value=50, update_time=6) ← Old data arriving late ``` **Expected behavior:** - `pickHighLevel()` should select L2 (update_time=8) as the "latest" high-level record - Result should reflect the record with highest sequence value **Actual behavior:** - `pickHighLevel()` selects L1 (level 1 < level 2) ignoring sequence.field - Wrong changelog is generated ### What doesn't meet your expectations? `LookupMergeFunction.pickHighLevel()` only compares level numbers, ignoring `sequence.field`: ```java // LookupMergeFunction.java:88 - Current behavior if (highLevel == null || kv.level() < highLevel.level()) { highLevel = kv; // Always picks lowest level, ignores sequence } ``` **Reproducible scenario:** ```java // When candidates contain: // L1: (key=1, sequence=7) <- level 1 // L2: (key=1, sequence=8) <- level 2, but higher sequence (newer!) // pickHighLevel() returns L1 (because level 1 < 2) // But should return L2 (because sequence 8 > 7) ``` It should use `sequence.field` comparator when configured, similar to how `SortMergeReaderWithMinHeap` correctly handles it at line 61-67. ### Anything else? This issue only affects `changelog-producer = lookup` scenario. Normal queries (Batch/Streaming Scan) and Lookup Join are not affected. I'm working on a fix and will submit a PR shortly. The PR includes a complete unit test to reproduce this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
