Jeadie commented on PR #23199:
URL: https://github.com/apache/datafusion/pull/23199#issuecomment-4805319367
### Worked example
#### Before `EnsureRequirements` (still valid)
```
SortExec: TopK(fetch=4), expr=[score@1 DESC, a@0 ASC] out: [a,
score, value]
CoalescePartitionsExec out: [a,
score, value]
ProjectionExec: [a@0 as a, c@2 as score, b@1 as value] out: [a,
score, value]
SortExec: TopK(fetch=1000), expr=[c@2 DESC] out: [a, b,
c, d, e]
RepartitionExec: RoundRobinBatch out: [a, b,
c, d, e]
DataSourceExec out: [a, b,
c, d, e]
```
Resolving the sort expressions:
- Outer SortExec: its input is CoalescePartitionsExec → [a, score, value].
So score@1 = index 1 = score ✓.
- ProjectionExec: its input is the inner sort → [a, b, c, d, e]. So c@2 =
c ✓; output renames it to score at position 1.
- Inner SortExec: input [a, b, c, d, e], c@2 = c ✓.
#### After EnsureRequirements, without the fix (invalid)
`parallelize_sorts` replaces `CoalescePartitions` + the global sort with a
`SortPreservingMergeExec`, and pushes a per-partition `SortExec` below the
projection; copying `expr=[score@1, a@0]` down unchanged:
```
SortPreservingMergeExec: [score@1 DESC, a@0 ASC], fetch=4 out: [a,
score, value]
ProjectionExec: [a@0 as a, c@2 as score, b@1 as value] out: [a,
score, value]
SortExec: TopK(fetch=4), expr=[score@1 DESC, a@0 ASC] out: [a, b,
c, d, e] ◀── now BELOW the projection
SortExec: TopK(fetch=1000), expr=[c@2 DESC] out: [a, b,
c, d, e]
RepartitionExec: RoundRobinBatch out: [a, b,
c, d, e]
DataSourceExec out: [a, b,
c, d, e]
```
The relocated `SortExec`'s input is now [a, b, c, d, e] (not [a, score,
value]). So its `score@1` resolves against the wrong schema:
- @0 = a ✓
- @1 = b ✗ there is no score at this level; index 1 is b. The sort
silently orders by b, not score.
`SanityCheckPlan` rejects:
- The `SortPreservingMergeExec` (top) requires its input ordered by
`[score@1, a@0]` over `[a, score, value]`.
- Its child projection derives its output ordering from the relocated
sort, which orders by `[b@1, a@0]`. The projection maps `b@1 → value@2`, so it
can only offer `[value@2, a@0]` upward.
- Nothing satisfies `score@1`, and therefore raises `does not satisfy
order requirements: [score@1 DESC NULLS LAST, a@0 ASC]. Child-0 order: []`.
#### After EnsureRequirements, with the fix (valid)
The requirement is remapped as it crosses the projection (score@1 → c@2), so
the relocated sort indexes correctly into [a, b, c, d, e]:
```
SortPreservingMergeExec: [score@1 DESC, a@0 ASC], fetch=4 out: [a,
score, value]
ProjectionExec: [a@0 as a, c@2 as score, b@1 as value] out: [a,
score, value]
SortExec: TopK(fetch=4), expr=[c@2 DESC, a@0 ASC] out: [a, b,
c, d, e] ◀── remapped @1 -> @2
SortExec: TopK(fetch=1000), expr=[c@2 DESC] out: [a, b,
c, d, e]
RepartitionExec: RoundRobinBatch out: [a, b,
c, d, e]
DataSourceExec out: [a, b,
c, d, e]
```
Now the relocated sort orders by `[c@2, a@0] = [c, a]`; the projection
maps `c@2 → score@1`, so its output is ordered by `[score@1, a@0]` (what the
`SortPreservingMergeExec` needs). Valid.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]