alamb commented on code in PR #19287:
URL: https://github.com/apache/datafusion/pull/19287#discussion_r2789573897


##########
datafusion/physical-plan/src/aggregates/mod.rs:
##########
@@ -599,12 +598,13 @@ impl AggregateExec {
         // If existing ordering satisfies a prefix of the GROUP BY expressions,
         // prefix requirements with this section. In this case, aggregation 
will
         // work more efficiently.
-        let indices = get_ordered_partition_by_indices(&groupby_exprs, 
&input)?;
-        let mut new_requirements = indices
-            .iter()
-            .map(|&idx| {
-                PhysicalSortRequirement::new(Arc::clone(&groupby_exprs[idx]), 
None)
-            })
+        // Copy the `PhysicalSortExpr`s to retain the sort options.
+        let (new_sort_exprs, indices) =
+            input_eq_properties.find_longest_permutation(&groupby_exprs)?;
+
+        let mut new_requirements = new_sort_exprs
+            .into_iter()
+            .map(PhysicalSortRequirement::from)

Review Comment:
   Interestingly, I think this change has exposed a bug
   - https://github.com/apache/datafusion/issues/20244       
   
   Specifically what is happening in that ticket is that in some cases, the 
optimizer is now erroniously determining that the existing sort for the group 
keys matches. 
   
   Previously this issue was masked because
   
     1. 
[pushdown_would_violate_requirements](https://github.com/apache/datafusion/blob/d3ac7a351a3ad0e97f64d700d532bacab7b419dc/datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs#L393-L392)
 (...) uses 
[PhysicalSortRequirement::compatible](https://github.com/apache/datafusion/blob/a6fd5cc840d1b01ada8a48c7f2649789e86b256e/datafusion/physical-expr-common/src/sort_expr.rs#L303-L302)(...).
     2.  Since the options were `None`, the `compatible` would check with 
requirement (`None`) which is not compatible the `ASC` parent requirement.
   
   After this PR, the group by properly reports the necessary input ordering 
   
   
   Thus, before this PR, sort pushdown through aggregate was often (always?) 
blocked, which left the SortExec. After this PR, since the requirements now 
include options (`ASC`), the compatibility check passes (incorrectly), so 
pushdown continues.
    
   Then the logic in 
[handle_custom_pushdown](https://github.com/apache/datafusion/blob/d3ac7a351a3ad0e97f64d700d532bacab7b419dc/datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs#L605-L604)
 incorrectly rewrites sort keys by column index through AggregateExec instead 
of by expression



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to