zhuqi-lucas opened a new issue, #21169:
URL: https://github.com/apache/datafusion/issues/21169

   ### Describe the bug
   
   When `LimitPushdown` merges a `GlobalLimitExec` into a 
`CoalescePartitionsExec` (or `SortPreservingMergeExec`) as a `fetch` value, the 
`EnforceDistribution` optimizer rule strips and re-inserts 
distribution-changing operators **without preserving the `fetch`**. This causes 
queries with `LIMIT` over multi-partition sources to silently lose the limit 
and potentially return duplicate/extra rows.
   
   ### Root cause
   
   In `enforce_distribution.rs`, the function `remove_dist_changing_operators` 
strips `CoalescePartitionsExec`, `SortPreservingMergeExec`, and 
`RepartitionExec` from the plan tree. It does not capture or propagate any 
`fetch` value that was embedded in those operators. Later, when 
`add_merge_on_top` re-inserts a merge operator to satisfy `SinglePartition` 
distribution, the `fetch` is gone.
   
   ### To Reproduce
   
   1. Create a parquet table with multiple row groups / partitions.
   2. Run a query with `LIMIT`, e.g. `SELECT * FROM t LIMIT 1`.
   3. After `LimitPushdown`, the plan has `CoalescePartitionsExec(fetch=1)`.
   4. `EnforceDistribution` strips the `CoalescePartitionsExec` and re-inserts 
one without `fetch`.
   5. The limit is silently lost.
   
   ### Expected behavior
   
   `EnforceDistribution` should preserve the `fetch` value when removing and 
re-inserting distribution-changing operators.
   
   ### Additional context
   
   This is analogous to the existing logic that preserves ordering through 
`SortPreservingMergeExec` — the `fetch` (limit push-down) should receive the 
same treatment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to