TheR1sing3un commented on PR #7742: URL: https://github.com/apache/paimon/pull/7742#issuecomment-4353190017
Did a strict line-by-line audit against the current Java source `paimon-core/.../source/DataTableBatchScan.java:128-165`. Found one branch I'd missed: > **Java L129**: `if (pushDownLimit == null || snapshotReader.hasNonPartitionFilter()) return Optional.empty();` When the user's predicate references any non-partition column, Java skips limit pushdown entirely. Reason: per-split row counts (the accumulator input) are pre-filter, and summing them against `limit` would over-count once the reader applies the predicate, causing the early-return budget to be reached too soon and giving the reader fewer rows than `limit` actually asks for. Pushed `68656b74d` to mirror this. New private helper `_has_non_partition_filter()` mirrors Java's `SnapshotReaderImpl.hasNonPartitionFilter` (lines 235-248) using the existing `_get_all_fields` predicate walker; the limit-pushdown loop short-circuits when the predicate references any column outside `partition_keys`. Inline comments on `_apply_push_down_limit` now annotate every Java source line we mirror (L129 / L138 / L146 / L147-163 / L164) so the port is verifiable at a glance. Full Java vs Python correspondence after this round: | Java | Python | Status | |---|---|---| | L129 `pushDownLimit == null` | `if self.limit is None` | ✓ | | L129 `hasNonPartitionFilter()` | `if self._has_non_partition_filter()` | ✓ (this round) | | L138 `scannedRowCount = 0` | `scanned_row_count = 0` | ✓ | | L146 `limitedSplits = new ArrayList<>()` | `limited_splits = []` | ✓ | | L147 `for (Split split : splits)` | `for split in splits` | ✓ | | L148 `mergedRowCount = split.mergedRowCount()` | `merged = split.merged_row_count()` | ✓ | | L149 `if (mergedRowCount.isPresent())` | `if merged is not None` | ✓ | | L150 `limitedSplits.add(split)` | `limited_splits.append(split)` | ✓ | | L151 `scannedRowCount += mergedRowCount.getAsLong()` | `scanned_row_count += merged` | ✓ | | L152 `if (scannedRowCount >= pushDownLimit)` | `if scanned_row_count >= self.limit` | ✓ | | L153-160 build newPlan + return | `return limited_splits` | ✓ semantically | | L164 `return Optional.of(result)` | `return splits` | ✓ | Only Java element not ported: the two `LOG.info(...)` lines (decorative, no behavior). 11/11 unit tests pass, flake8 clean. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
