zhuqi-lucas opened a new issue, #23238:
URL: https://github.com/apache/datafusion/issues/23238
## Is your feature request related to a problem or challenge?
`CoalescePartitionsExec` carries an optional `fetch: Option<usize>` so
`LimitPushdown` can fold `GlobalLimit(N) -> CoalescePartitionsExec`
into `CoalescePartitionsExec(fetch=N)`. The fetch semantics were
tightened in #14418 and #18245, the latter making them consistent
across single-partition and multi-partition cases.
`UnionExec` has no equivalent. When `LimitPushdown` sees
`GlobalLimit(N) -> UnionExec` it has to leave a wrapping
`CoalescePartitionsExec(fetch=N)` (or a separate `GlobalLimitExec`)
on top to enforce the cap. Any downstream optimizer pass that peels
the Union apart for rewriting (e.g. distributed execution rewrites)
has to remember to re-attach that wrapper, otherwise the union
returns N rows per child and the LIMIT is silently violated.
This is the same class of bug #18245 closed for
`CoalescePartitionsExec`, but here the inconsistency is single-child
vs multi-child `Union`.
## Describe the solution you'd like
Add `fetch: Option<usize>` to `UnionExec`, plus:
- `with_fetch()` builder method
- `execute()` honors `fetch`: stop emitting once N rows produced,
cancel pending children, same way `SortPreservingMergeExec` does it
- `LimitPushdown` learns to fold `GlobalLimit(N) -> Union` into
`Union(fetch=N)`, mirroring the existing fold for `Coalesce`
- `with_new_children` preserves `fetch` so rewrites that reassemble
a `Union` from transformed children cannot silently drop the cap
- Behavior is consistent across single-child and multi-child `Union`
(the analog of #18245 for this operator)
## Describe alternatives you've considered
Keeping a separate `CoalescePartitionsExec(fetch=N)` /
`GlobalLimitExec` on top of `Union`. This works as long as every
consumer of `Union` remembers to preserve that wrapper across
rewrites. We have hit a downstream fork bug where a
peel-and-reassemble pass dropped the wrapper, surfacing only in
multi-shard topologies. Encoding the cap on the `Union` node itself
removes the whole class of "forgot to re-attach the wrapper" bugs.
## Additional context
Prior `fetch` work in this area: #14418, #18245, #21170. Happy to
put up the PR if maintainers agree on the shape.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]