alamb commented on code in PR #10913: URL: https://github.com/apache/datafusion/pull/10913#discussion_r1641319524
########## datafusion/sqllogictest/test_files/joins.slt: ########## @@ -1382,18 +1382,17 @@ physical_plan 02)--AggregateExec: mode=Final, gby=[], aggr=[COUNT(alias1)] 03)----CoalescePartitionsExec 04)------AggregateExec: mode=Partial, gby=[], aggr=[COUNT(alias1)] -05)--------AggregateExec: mode=FinalPartitioned, gby=[alias1@0 as alias1], aggr=[] -06)----------AggregateExec: mode=Partial, gby=[t1_id@0 as alias1], aggr=[] -07)------------CoalesceBatchesExec: target_batch_size=2 -08)--------------HashJoinExec: mode=Partitioned, join_type=Inner, on=[(t1_id@0, t2_id@0)], projection=[t1_id@0] -09)----------------CoalesceBatchesExec: target_batch_size=2 -10)------------------RepartitionExec: partitioning=Hash([t1_id@0], 2), input_partitions=2 -11)--------------------RepartitionExec: partitioning=RoundRobinBatch(2), input_partitions=1 -12)----------------------MemoryExec: partitions=1, partition_sizes=[1] -13)----------------CoalesceBatchesExec: target_batch_size=2 -14)------------------RepartitionExec: partitioning=Hash([t2_id@0], 2), input_partitions=2 -15)--------------------RepartitionExec: partitioning=RoundRobinBatch(2), input_partitions=1 -16)----------------------MemoryExec: partitions=1, partition_sizes=[1] +05)--------AggregateExec: mode=SinglePartitioned, gby=[t1_id@0 as alias1], aggr=[] Review Comment: I agree this plan looks better and correct ########## datafusion/core/src/physical_optimizer/combine_partial_final_agg.rs: ########## @@ -144,8 +144,12 @@ fn can_combine(final_agg: GroupExprsRef, partial_agg: GroupExprsRef) -> bool { let (input_group_by, input_aggr_expr, input_filter_expr) = normalize_group_exprs(partial_agg); - final_group_by.eq(&input_group_by) Review Comment: I am not sure that just checking the length of the group bys is sufficient -- I think logically they must be the same. It seems like the reason these weren't combined ``` 05)--------AggregateExec: mode=FinalPartitioned, gby=[alias1@0 as alias1], aggr=[] 06)----------AggregateExec: mode=Partial, gby=[t1_id@0 as alias1], aggr=[] ``` Is becase of aliasing the exprs didn't match exactly -- `t1_id@0 as alias1` didn't match `alias1@0 as alias1` even though I think they are logically equivalent So for example, if we ever made the following plan (with actually different grouping expressions) after this change the code would incorrectly collapse them ``` 05)--------AggregateExec: mode=FinalPartitioned, gby=[alias1 / 2 as alias1], aggr=[] 06)----------AggregateExec: mode=Partial, gby=[t1_id@0 as alias1], aggr=[] ``` However, I am not sure that such a plan would be valid 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org