ologlogn opened a new pull request, #22597:
URL: https://github.com/apache/datafusion/pull/22597
## Which issue does this PR close?
No existing issue. Discovered while investigating incorrect results from
`ARRAY_AGG(x ORDER BY y DESC)` in a multi-partition query.
## Rationale for this change
### Bug description
When multiple `ARRAY_AGG` expressions with conflicting `ORDER BY` directions
(ASC and DESC) appear in the same query, results for the DESC variants can be
silently wrong.
**Trigger conditions** (all must hold):
1. The input to the partial aggregate is pre-sorted ASC — either from a
`SortExec` added by the planner or from a pre-ordered scan.
2. A DESC `ARRAY_AGG` is present. The `OptimizeAggregateOrder` rule detects
that the existing ASC ordering satisfies the _reverse_ of the DESC requirement
and calls `reverse_expr()` on the partial accumulator, producing:
`is_input_pre_ordered=true`, `reverse=true`, `ordering_req=[ASC]`.
3. The query uses multi-partition execution (Partial + Final aggregate
stages).
### Root cause
In `OrderSensitiveArrayAggAccumulator::state()`:
```rust
let mut result = vec![self.evaluate()?]; // reverses values → DESC
order ✓
result.push(self.evaluate_orderings()?); // ordering keys stay in ASC
order ✗
```
`evaluate()` reverses `self.values` (producing DESC), but
`evaluate_orderings()` always iterated `self.ordering_values` forward (ASC).
The partial state therefore emits a **mismatched** pair: values in DESC order,
ordering keys in ASC order.
The final accumulator's `merge_batch` calls `merge_ordered_arrays` using the
ordering keys to decide merge priority. Because the keys are paired with the
wrong values, the k-way merge produces the wrong order — silently, no error or
panic.
### How to reproduce
The new unit test `desc_order_partial_final_merge_correct` directly
reproduces the failure:
1. Two partial accumulators with `is_input_pre_ordered=true, reverse=true,
ordering_req=[ASC]` accumulate rows `[0,1,2]` and `[3,4,5]` respectively
(simulating ASC pre-ordered input).
2. Their states are fed into a final accumulator (`ordering_req=[DESC],
reverse=false`) via `merge_batch`.
3. Without the fix, `evaluate()` returns `[3, 4, 5, 0, 1, 2]` instead of the
expected `[5, 4, 3, 2, 1, 0]`.
A real query that triggers this:
```sql
SELECT
grp,
ARRAY_AGG(ts ORDER BY ts ASC) AS ts_asc,
ARRAY_AGG(ts ORDER BY ts DESC) AS ts_desc
FROM t
GROUP BY grp
```
When the planner adds a `SortExec [ts ASC]` before the partial aggregate (to
exploit pre-ordering for the ASC aggregate), the DESC aggregate is reversed by
the optimizer and hits this bug.
## What changes are included in this PR?
Single change in `evaluate_orderings()` inside
`OrderSensitiveArrayAggAccumulator`:
```rust
// Before
let column_values = self.ordering_values.iter().map(|x| x[i].clone());
// After
let column_values: Box<dyn Iterator<Item = ScalarValue>> = if self.reverse {
Box::new(self.ordering_values.iter().rev().map(|x| x[i].clone()))
} else {
Box::new(self.ordering_values.iter().map(|x| x[i].clone()))
};
```
When `reverse=true`, ordering keys are iterated in reverse to stay
consistent with the reversed values emitted by `evaluate()`, so the final
`merge_batch` receives correctly paired (value, ordering_key) entries.
## Are these changes tested?
Yes. New unit test `desc_order_partial_final_merge_correct` in
`array_agg.rs` directly exercises the partial→final merge path with a reversed
accumulator. The test fails before this fix and passes after.
## Are there any user-facing changes?
Yes — `ARRAY_AGG(x ORDER BY y DESC)` now returns correct results when the
query uses multi-partition execution and the optimizer reverses the partial
accumulator.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]