zhuqi-lucas opened a new pull request, #21170:
URL: https://github.com/apache/datafusion/pull/21170

   ## Which issue does this PR close?
   
   - Closes #21169.
   
   ## Rationale for this change
   
   When `LimitPushdown` merges a `GlobalLimitExec` into a 
`CoalescePartitionsExec` or `SortPreservingMergeExec` as a `fetch` value, the 
`EnforceDistribution` optimizer rule strips and re-inserts these 
distribution-changing operators **without preserving the `fetch`**. This 
silently drops the LIMIT for queries over multi-partition sources, potentially 
returning duplicate/extra rows.
   
   ## What changes are included in this PR?
   
   1. **`remove_dist_changing_operators`** now captures any `fetch` value (and 
the original `SortPreservingMergeExec` if present) before stripping operators.
   2. **`add_merge_on_top`** accepts an optional `fetch` and applies it to the 
newly created `SortPreservingMergeExec`.
   3. **Fallback re-introduction**: if the `fetch` was not consumed by 
`add_merge_on_top` (e.g., when the parent had `UnspecifiedDistribution` or the 
child already had a single partition), the limit is re-introduced as a wrapping 
operator so it is never silently lost.
   
   ## Are these changes tested?
   
   Yes, three new tests are added:
   
   - `coalesce_partitions_fetch_preserved_by_enforce_distribution` — unsorted 
multi-partition source with `CoalescePartitionsExec(fetch=1)`
   - `coalesce_partitions_fetch_preserved_sorted` — sorted multi-partition 
source with `CoalescePartitionsExec(fetch=5)`
   - `spm_fetch_preserved_by_enforce_distribution` — sorted multi-partition 
source with `SortPreservingMergeExec(fetch=3)`
   
   ## Are there any user-facing changes?
   
   No API changes. Queries with LIMIT over multi-partition sources will now 
correctly preserve the limit through the `EnforceDistribution` optimizer pass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to