wwj6591812 commented on code in PR #8116:
URL: https://github.com/apache/paimon/pull/8116#discussion_r3417899653
##########
paimon-core/src/main/java/org/apache/paimon/operation/MergeFileSplitRead.java:
##########
@@ -312,6 +330,7 @@ public RecordReader<KeyValue> createMergeReader(
reader = new DropDeleteReader(reader);
}
+ reader = LimitRecordReader.limit(reader, effectiveReadLimit());
Review Comment:
Thanks for catching this, @JingsongLi .
You're right — applying the limit inside each split reader is incorrect when
TableRead.createReader(List<Split>) concatenates multiple splits, because each
split could emit up to limit rows.
I've fixed this with a hybrid approach:
Single-split plans: keep the merge-read limit optimization (early truncation
during merge).
Multi-split plans: disable the per-split merge-read limit and apply a global
LimitRecordReader at the KeyValueTableRead.createReader(List<Split>) level, so
the total row count is capped correctly.
Added testReadWithLimitThroughTableReadPathMultiSplit in
PrimaryKeySimpleTableTest (BUCKET=4, overlapping L0 files across buckets) to
verify that withLimit(10).createReader(plan.splits()) returns exactly 10 rows
when the plan contains multiple splits.
Please take another look. Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]