alamb commented on code in PR #10913:
URL: https://github.com/apache/datafusion/pull/10913#discussion_r1641319524


##########
datafusion/sqllogictest/test_files/joins.slt:
##########
@@ -1382,18 +1382,17 @@ physical_plan
 02)--AggregateExec: mode=Final, gby=[], aggr=[COUNT(alias1)]
 03)----CoalescePartitionsExec
 04)------AggregateExec: mode=Partial, gby=[], aggr=[COUNT(alias1)]
-05)--------AggregateExec: mode=FinalPartitioned, gby=[alias1@0 as alias1], 
aggr=[]
-06)----------AggregateExec: mode=Partial, gby=[t1_id@0 as alias1], aggr=[]
-07)------------CoalesceBatchesExec: target_batch_size=2
-08)--------------HashJoinExec: mode=Partitioned, join_type=Inner, 
on=[(t1_id@0, t2_id@0)], projection=[t1_id@0]
-09)----------------CoalesceBatchesExec: target_batch_size=2
-10)------------------RepartitionExec: partitioning=Hash([t1_id@0], 2), 
input_partitions=2
-11)--------------------RepartitionExec: partitioning=RoundRobinBatch(2), 
input_partitions=1
-12)----------------------MemoryExec: partitions=1, partition_sizes=[1]
-13)----------------CoalesceBatchesExec: target_batch_size=2
-14)------------------RepartitionExec: partitioning=Hash([t2_id@0], 2), 
input_partitions=2
-15)--------------------RepartitionExec: partitioning=RoundRobinBatch(2), 
input_partitions=1
-16)----------------------MemoryExec: partitions=1, partition_sizes=[1]
+05)--------AggregateExec: mode=SinglePartitioned, gby=[t1_id@0 as alias1], 
aggr=[]

Review Comment:
   I agree this plan looks better and correct



##########
datafusion/core/src/physical_optimizer/combine_partial_final_agg.rs:
##########
@@ -144,8 +144,12 @@ fn can_combine(final_agg: GroupExprsRef, partial_agg: 
GroupExprsRef) -> bool {
     let (input_group_by, input_aggr_expr, input_filter_expr) =
         normalize_group_exprs(partial_agg);
 
-    final_group_by.eq(&input_group_by)

Review Comment:
   I am not sure that just checking the length of the group bys is sufficient 
-- I think logically they must be the same.
   
   It seems like the reason these weren't combined
   ```
   05)--------AggregateExec: mode=FinalPartitioned, gby=[alias1@0 as alias1], 
aggr=[]
   06)----------AggregateExec: mode=Partial, gby=[t1_id@0 as alias1], aggr=[]
   ```
   Is becase of aliasing the exprs didn't match exactly -- `t1_id@0 as alias1` 
didn't match `alias1@0 as alias1` even though I think they are logically 
equivalent
   
   So for example, if we ever made the following plan (with actually different 
grouping expressions) after this change the code would incorrectly collapse them
   
   ```
   05)--------AggregateExec: mode=FinalPartitioned, gby=[alias1 / 2 as alias1], 
aggr=[]
   06)----------AggregateExec: mode=Partial, gby=[t1_id@0 as alias1], aggr=[]
   ```
   
   However, I am not sure that such a plan would be valid 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to