[I] Limit Scan will emit wrong data for partitioned table in partition filter scan [fluss]

via GitHub Mon, 12 Jan 2026 04:01:33 -0800


luoyuxia opened a new issue, #2353:
URL: https://github.com/apache/fluss/issues/2353


   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/fluss/issues) and 
found nothing similar.
   
   
   ### Fluss version
   
   0.8.0 (latest release)
   
   ### Please describe the bug 🐞
   
   Can be reproduce by the following test, add a new test method in 
FlinkUnionReadPrimaryKeyTableITCase
   ```
   @Test
       void t1() throws Exception {
           boolean isPartitioned = true;
           JobClient jobClient = buildTieringJob(execEnv);
   
           String tableName = "pk_table_full" + (isPartitioned ? "_partitioned" 
: "_non_partitioned");
           TablePath t1 = TablePath.of(DEFAULT_DB, tableName);
           Map<TableBucket, Long> bucketLogEndOffset = new HashMap<>();
           // create table & write initial data
           long tableId =
                   preparePKTableFullType(t1, DEFAULT_BUCKET_NUM, 
isPartitioned, bucketLogEndOffset);
   
   //        // wait unit records have been synced
   //        waitUntilBucketSynced(t1, tableId, DEFAULT_BUCKET_NUM, 
isPartitioned);
   //
   //        // check the status of replica after synced
   //        assertReplicaStatus(t1, tableId, DEFAULT_BUCKET_NUM, 
isPartitioned, bucketLogEndOffset);
   
           // will read paimon snapshot, won't merge log since it's empty
           List<String> resultEmptyLog =
                   toSortedRows(
                           batchTEnv.executeSql(
                                   "select * from " + tableName + " where c16 = 
'not_exists' limit 10"));
           System.out.println(resultEmptyLog);
   
           jobClient.cancel().get();
       }
   ```
   
   Then print 
   ```
   [+I[false, 1, 2, 3, 4, 5.1, 6.0, string, 0.09, 10, 2023-10-25T12:01:13.182Z, 
2023-10-25T12:01:13.182005Z, 2023-10-25T12:01:13.183, 
2023-10-25T12:01:13.183006, [1, 2, 3, 4], not_exists], +I[false, 1, 2, 3, 4, 
5.1, 6.0, string, 0.09, 10, 2023-10-25T12:01:13.182Z, 
2023-10-25T12:01:13.182005Z, 2023-10-25T12:01:13.183, 
2023-10-25T12:01:13.183006, [1, 2, 3, 4], not_exists], +I[true, 10, 20, 30, 40, 
50.1, 60.0, another_string, 0.90, 100, 2023-10-25T12:01:13.200Z, 
2023-10-25T12:01:13.200005Z, 2023-10-25T12:01:13.201, 
2023-10-25T12:01:13.201006, [1, 2, 3, 4], not_exists], +I[true, 10, 20, 30, 40, 
50.1, 60.0, another_string, 0.90, 100, 2023-10-25T12:01:13.200Z, 
2023-10-25T12:01:13.200005Z, 2023-10-25T12:01:13.201, 
2023-10-25T12:01:13.201006, [1, 2, 3, 4], not_exists]]
   
   ```
   
   
   Note the main branch won't have the issue, the reason is after #1934 , the 
filter won't be accept, so the limit won't push down.
   
   But I create the issue to track it, the reason is that the source accept the 
partition filter, and then do the limit scan, but the limit scan don't respect 
the partition  filter and return the result. Flink'll append the partition 
predicate `'not_exists'` into the scan row, which cause wrong result.
   
    
   
   ### Solution
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Limit Scan will emit wrong data for partitioned table in partition filter scan [fluss]

Reply via email to