zhuqi-lucas opened a new pull request, #21170: URL: https://github.com/apache/datafusion/pull/21170
## Which issue does this PR close? - Closes #21169. ## Rationale for this change When `LimitPushdown` merges a `GlobalLimitExec` into a `CoalescePartitionsExec` or `SortPreservingMergeExec` as a `fetch` value, the `EnforceDistribution` optimizer rule strips and re-inserts these distribution-changing operators **without preserving the `fetch`**. This silently drops the LIMIT for queries over multi-partition sources, potentially returning duplicate/extra rows. ## What changes are included in this PR? 1. **`remove_dist_changing_operators`** now captures any `fetch` value (and the original `SortPreservingMergeExec` if present) before stripping operators. 2. **`add_merge_on_top`** accepts an optional `fetch` and applies it to the newly created `SortPreservingMergeExec`. 3. **Fallback re-introduction**: if the `fetch` was not consumed by `add_merge_on_top` (e.g., when the parent had `UnspecifiedDistribution` or the child already had a single partition), the limit is re-introduced as a wrapping operator so it is never silently lost. ## Are these changes tested? Yes, three new tests are added: - `coalesce_partitions_fetch_preserved_by_enforce_distribution` — unsorted multi-partition source with `CoalescePartitionsExec(fetch=1)` - `coalesce_partitions_fetch_preserved_sorted` — sorted multi-partition source with `CoalescePartitionsExec(fetch=5)` - `spm_fetch_preserved_by_enforce_distribution` — sorted multi-partition source with `SortPreservingMergeExec(fetch=3)` ## Are there any user-facing changes? No API changes. Queries with LIMIT over multi-partition sources will now correctly preserve the limit through the `EnforceDistribution` optimizer pass. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
