alamb commented on code in PR #19287:
URL: https://github.com/apache/datafusion/pull/19287#discussion_r2789573897
##########
datafusion/physical-plan/src/aggregates/mod.rs:
##########
@@ -599,12 +598,13 @@ impl AggregateExec {
// If existing ordering satisfies a prefix of the GROUP BY expressions,
// prefix requirements with this section. In this case, aggregation
will
// work more efficiently.
- let indices = get_ordered_partition_by_indices(&groupby_exprs,
&input)?;
- let mut new_requirements = indices
- .iter()
- .map(|&idx| {
- PhysicalSortRequirement::new(Arc::clone(&groupby_exprs[idx]),
None)
- })
+ // Copy the `PhysicalSortExpr`s to retain the sort options.
+ let (new_sort_exprs, indices) =
+ input_eq_properties.find_longest_permutation(&groupby_exprs)?;
+
+ let mut new_requirements = new_sort_exprs
+ .into_iter()
+ .map(PhysicalSortRequirement::from)
Review Comment:
Interestingly, I think this change has exposed a bug
- https://github.com/apache/datafusion/issues/20244
Specifically what is happening in that ticket is that in some cases, the
optimizer is now erroniously determining that the existing sort for the group
keys matches.
Previously this issue was masked because
1.
[pushdown_would_violate_requirements](https://github.com/apache/datafusion/blob/d3ac7a351a3ad0e97f64d700d532bacab7b419dc/datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs#L393-L392)
(...) uses
[PhysicalSortRequirement::compatible](https://github.com/apache/datafusion/blob/a6fd5cc840d1b01ada8a48c7f2649789e86b256e/datafusion/physical-expr-common/src/sort_expr.rs#L303-L302)(...).
2. Since the options were `None`, the `compatible` would check with
requirement (`None`) which is not compatible the `ASC` parent requirement.
After this PR, the group by properly reports the necessary input ordering
Thus, before this PR, sort pushdown through aggregate was often (always?)
blocked, which left the SortExec. After this PR, since the requirements now
include options (`ASC`), the compatibility check passes (incorrectly), so
pushdown continues.
Then the logic in
[handle_custom_pushdown](https://github.com/apache/datafusion/blob/d3ac7a351a3ad0e97f64d700d532bacab7b419dc/datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs#L605-L604)
incorrectly rewrites sort keys by column index through AggregateExec instead
of by expression
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]