Re: [PR] [core] Apply LIMIT during merge-file split read when safe. [paimon]

via GitHub Mon, 15 Jun 2026 20:16:59 -0700


wwj6591812 commented on code in PR #8116:
URL: https://github.com/apache/paimon/pull/8116#discussion_r3417899653



##########
paimon-core/src/main/java/org/apache/paimon/operation/MergeFileSplitRead.java:
##########
@@ -312,6 +330,7 @@ public RecordReader<KeyValue> createMergeReader(
             reader = new DropDeleteReader(reader);
         }
 
+        reader = LimitRecordReader.limit(reader, effectiveReadLimit());

Review Comment:
   Thanks for catching this, @JingsongLi .
   
   You're right — applying the limit inside each split reader is incorrect when 
TableRead.createReader(List<Split>) concatenates multiple splits, because each 
split could emit up to limit rows.
   
   I've fixed this with a hybrid approach:
   
   Single-split plans: keep the merge-read limit optimization (early truncation 
during merge).
   Multi-split plans: disable the per-split merge-read limit and apply a global 
LimitRecordReader at the KeyValueTableRead.createReader(List<Split>) level, so 
the total row count is capped correctly.
   Added testReadWithLimitThroughTableReadPathMultiSplit in 
PrimaryKeySimpleTableTest (BUCKET=4, overlapping L0 files across buckets) to 
verify that withLimit(10).createReader(plan.splits()) returns exactly 10 rows 
when the plan contains multiple splits.
   
   Please take another look. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [core] Apply LIMIT during merge-file split read when safe. [paimon]

Reply via email to