JingsongLi commented on code in PR #8116:
URL: https://github.com/apache/paimon/pull/8116#discussion_r3417732182
##########
paimon-core/src/main/java/org/apache/paimon/operation/MergeFileSplitRead.java:
##########
@@ -312,6 +330,7 @@ public RecordReader<KeyValue> createMergeReader(
reader = new DropDeleteReader(reader);
}
+ reader = LimitRecordReader.limit(reader, effectiveReadLimit());
Review Comment:
The limit is scoped to this split reader, but
`TableRead.createReader(List<Split>)` just concatenates one reader per split.
If a plan still contains multiple `DataSplit`s (for example multiple
buckets/partitions, or any case where scan-side limit pushdown cannot shrink
the plan to one split), each split can now emit up to `limit` rows, so
`table.newRead().withLimit(10).createReader(plan.splits())` may return
20/30/... rows instead of 10. The new table-read test only covers the default
single-bucket plan, so it does not catch this. Please either keep the read-side
limit global across the concatenated plan, or only enable this optimization
when the plan is guaranteed to contain a single split, and add a multi-split
regression test.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]