zhuqi-lucas opened a new issue, #21169: URL: https://github.com/apache/datafusion/issues/21169
### Describe the bug When `LimitPushdown` merges a `GlobalLimitExec` into a `CoalescePartitionsExec` (or `SortPreservingMergeExec`) as a `fetch` value, the `EnforceDistribution` optimizer rule strips and re-inserts distribution-changing operators **without preserving the `fetch`**. This causes queries with `LIMIT` over multi-partition sources to silently lose the limit and potentially return duplicate/extra rows. ### Root cause In `enforce_distribution.rs`, the function `remove_dist_changing_operators` strips `CoalescePartitionsExec`, `SortPreservingMergeExec`, and `RepartitionExec` from the plan tree. It does not capture or propagate any `fetch` value that was embedded in those operators. Later, when `add_merge_on_top` re-inserts a merge operator to satisfy `SinglePartition` distribution, the `fetch` is gone. ### To Reproduce 1. Create a parquet table with multiple row groups / partitions. 2. Run a query with `LIMIT`, e.g. `SELECT * FROM t LIMIT 1`. 3. After `LimitPushdown`, the plan has `CoalescePartitionsExec(fetch=1)`. 4. `EnforceDistribution` strips the `CoalescePartitionsExec` and re-inserts one without `fetch`. 5. The limit is silently lost. ### Expected behavior `EnforceDistribution` should preserve the `fetch` value when removing and re-inserting distribution-changing operators. ### Additional context This is analogous to the existing logic that preserves ordering through `SortPreservingMergeExec` — the `fetch` (limit push-down) should receive the same treatment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
