Jeadie opened a new pull request, #23199:
URL: https://github.com/apache/datafusion/pull/23199
## Which issue does this PR close?
None — filing the fix directly. Happy to open a tracking issue if
maintainers prefer.
## Rationale for this change
`EnforceSorting`/`EnsureRequirements` can produce a physically invalid plan
that `SanityCheckPlan` rejects:
```
... does not satisfy order requirements: [score@1 DESC NULLS LAST, id@0
ASC]. Child-0 order: []
```
**Root cause.** In `sort_pushdown.rs::pushdown_requirement_to_children`, the
fetch-forwarding branch
```rust
} else if plan.fetch().is_some()
&& plan.supports_limit_pushdown()
&& plan.maintains_input_order().into_iter().all(|m| m)
{
... Ok(Some(vec![Some(parent_required)])) // forwards the requirement
UNCHANGED
}
```
forwards the parent ordering requirement to the child **unchanged**. A
`ProjectionExec` reports a `fetch()` (forwarded from its input) and can
renumber/reorder columns, so the requirement — expressed in the projection's
**output** schema — is pushed into the **child** schema verbatim. For a
projection that reorders the sort key (e.g. output `[id, score, value]` over
input `[id, value, score]`), `score@1` (valid above) is pushed down as
`score@1` (a different column below). The relocated `SortExec` then advertises
an ordering its child cannot provide, and `SanityCheckPlan` fails.
This reproduces whenever a fetch-bearing global sort sits above a
`CoalescePartitionsExec` over a column-reordering, order-preserving
`ProjectionExec` whose input is already ordered — `parallelize_sorts` sinks the
per-partition `SortExec` below the projection without remapping the key index.
## What changes are included in this PR?
Remap the ordering requirement through the projection's column mapping
before pushing it down (new `remap_requirement_through_projection` helper):
- Each required column at output index `i` is rewritten to the column the
projection produces at that index (`projection.expr()[i]`).
- If any required column maps to a **computed** (non-`Column`) expression,
the ordering cannot be expressed below the projection, so pushdown is declined
and the sort is kept **above** the projection.
- Hard/soft-ness and all alternatives of the `OrderingRequirements` are
preserved; unsatisfiable alternatives are dropped.
For non-reordering projections the remap is the identity, so existing plans
are unchanged.
## Are these changes tested?
Yes.
- Unit tests for the remap helper: reordering remap, softness preservation,
multiple hard alternatives, dropping an unsatisfiable alternative while keeping
others, and declining when a required column is computed.
- An end-to-end regression test
(`test_parallelize_sorts_remaps_index_through_reordering_projection`) builds
the multi-partition reordering-projection plan, runs `EnsureRequirements` with
`repartition_sorts` enabled, and asserts the result passes `SanityCheckPlan`.
It fails without this fix and passes with it.
## Are there any user-facing changes?
No API changes. Plans that previously failed `SanityCheckPlan` (or produced
incorrect orderings) for this shape are now valid; all other plans are
unchanged.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]