[I] [CH] Make `ExpandExec` be lazy executed after the first partial aggregation [incubator-gluten]

via GitHub Tue, 22 Oct 2024 19:22:13 -0700


lgbo-ustc opened a new issue, #7647:
URL: https://github.com/apache/incubator-gluten/issues/7647


   ### Description
   
   For a simple example
   ```sql
   select 
     n_regionkey, n_nationkey, sum(n_regionkey), count(n_name) 
   from
     tpch_pq.nation 
   group by 
     n_regionkey, n_nationkey with cube order by n_regionkey, n_nationkey;
   ```
   
   We have the following plan
   ```
   AdaptiveSparkPlan isFinalPlan=false
   +- Sort [n_regionkey#16L ASC NULLS FIRST, n_nationkey#17L ASC NULLS FIRST], 
true, 0
      +- Exchange rangepartitioning(n_regionkey#16L ASC NULLS FIRST, 
n_nationkey#17L ASC NULLS FIRST, 5), ENSURE_REQUIREMENTS, [plan_id=136]
         +- HashAggregate(keys=[n_regionkey#16L, n_nationkey#17L, 
spark_grouping_id#15L], functions=[sum(n_regionkey#7L), count(n_name#6)])
            +- Exchange hashpartitioning(n_regionkey#16L, n_nationkey#17L, 
spark_grouping_id#15L, 5), ENSURE_REQUIREMENTS, [plan_id=133]
               +- HashAggregate(keys=[n_regionkey#16L, n_nationkey#17L, 
spark_grouping_id#15L], functions=[partial_sum(n_regionkey#7L), 
partial_count(n_name#6)])
                  +- Expand [[n_name#6, n_regionkey#7L, n_regionkey#7L, 
n_nationkey#5L, 0], [n_name#6, n_regionkey#7L, n_regionkey#7L, null, 1], 
[n_name#6, n_regionkey#7L, null, n_nationkey#5L, 2], [n_name#6, n_regionkey#7L, 
null, null, 3]], [n_name#6, n_regionkey#7L, n_regionkey#16L, n_nationkey#17L, 
spark_grouping_id#15L]
                     +- Project [n_name#6, n_regionkey#7L, n_regionkey#7L, 
n_nationkey#5L]
                        +- FileScan parquet 
tpch_pq.nation[n_nationkey#5L,n_name#6,n_regionkey#7L] Batched: true,
   ```
   
   The `expand` could be move after the first `HashAggregate`, and make it look 
like
   ```
   AdaptiveSparkPlan isFinalPlan=false
   +- Sort [n_regionkey#47L ASC NULLS FIRST, n_nationkey#48L ASC NULLS FIRST], 
true, 0
      +- Exchange rangepartitioning(n_regionkey#47L ASC NULLS FIRST, 
n_nationkey#48L ASC NULLS FIRST, 5), ENSURE_REQUIREMENTS, [plan_id=348]
         +- HashAggregate(keys=[n_regionkey#47L, n_nationkey#48L, 
spark_grouping_id#46L], functions=[sum(n_regionkey#2L), count(n_name#1)])
            +- Exchange hashpartitioning(n_regionkey#47L, n_nationkey#48L, 
spark_grouping_id#46L, 5), ENSURE_REQUIREMENTS, [plan_id=346]
               +- Expand [[n_regionkey#2L, n_nationkey#0L, 0, sum#51L, 
count#52L], [n_regionkey#2L, null, 1, sum#51L, count#52L], [null, 
n_nationkey#0L, 2, sum#51L, count#52L], [null, null, 3, sum#51L, count#52L]], 
[n_regionkey#47L, n_nationkey#48L, spark_grouping_id#46L, sum#51L, count#52L]
                  +- HashAggregate(keys=[n_regionkey#2L, n_nationkey#0L], 
functions=[partial_sum(n_regionkey#2L), partial_count(n_name#1)])
                     +- Project [n_name#1, n_regionkey#2L, n_regionkey#2L, 
n_nationkey#0L]
                        +- FileScan parquet 
tpch_pq.nation[n_nationkey#0L,n_name#1,n_regionkey#2L] Batched: true
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [CH] Make `ExpandExec` be lazy executed after the first partial aggregation [incubator-gluten]

Reply via email to