Re: [PR] [core] Apply LIMIT during merge-file split read when safe. [paimon]

via GitHub Mon, 15 Jun 2026 19:20:41 -0700


JingsongLi commented on code in PR #8116:
URL: https://github.com/apache/paimon/pull/8116#discussion_r3417732182



##########
paimon-core/src/main/java/org/apache/paimon/operation/MergeFileSplitRead.java:
##########
@@ -312,6 +330,7 @@ public RecordReader<KeyValue> createMergeReader(
             reader = new DropDeleteReader(reader);
         }
 
+        reader = LimitRecordReader.limit(reader, effectiveReadLimit());

Review Comment:
   The limit is scoped to this split reader, but 
`TableRead.createReader(List<Split>)` just concatenates one reader per split. 
If a plan still contains multiple `DataSplit`s (for example multiple 
buckets/partitions, or any case where scan-side limit pushdown cannot shrink 
the plan to one split), each split can now emit up to `limit` rows, so 
`table.newRead().withLimit(10).createReader(plan.splits())` may return 
20/30/... rows instead of 10. The new table-read test only covers the default 
single-bucket plan, so it does not catch this. Please either keep the read-side 
limit global across the concatenated plan, or only enable this optimization 
when the plan is guaranteed to contain a single split, and add a multi-split 
regression test.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [core] Apply LIMIT during merge-file split read when safe. [paimon]

Reply via email to