jayzhan211 commented on PR #12996:
URL: https://github.com/apache/datafusion/pull/12996#issuecomment-2451402632

   Tpch q14 doesn't seem to run through the change of this PR -- `groupBy` is 
empty in `AggregateExec `, I also doesn't see any print out in 
`VectorizedGroupValuesColumn`. I think this change is not the reason of 
slowdown 🤔 
   
   ```
   query TT
   explain select
               100.00 * sum(case
                                when p_type like 'PROMO%'
                                    then l_extendedprice * (1 - l_discount)
                                else 0
               end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
   from
       lineitem,
       part
   where
           l_partkey = p_partkey
     and l_shipdate >= date '1995-09-01'
     and l_shipdate < date '1995-10-01';
   ----
   logical_plan
   01)Projection: Float64(100) * CAST(sum(CASE WHEN part.p_type LIKE 
Utf8("PROMO%") THEN lineitem.l_extendedprice * Int64(1) - lineitem.l_discount 
ELSE Int64(0) END) AS Float64) / CAST(sum(lineitem.l_extendedprice * Int64(1) - 
lineitem.l_discount) AS Float64) AS promo_revenue
   02)--Aggregate: groupBy=[[]], aggr=[[sum(CASE WHEN part.p_type LIKE 
Utf8("PROMO%") THEN __common_expr_1 ELSE Decimal128(Some(0),38,4) END) AS 
sum(CASE WHEN part.p_type LIKE Utf8("PROMO%") THEN lineitem.l_extendedprice * 
Int64(1) - lineitem.l_discount ELSE Int64(0) END), sum(__common_expr_1) AS 
sum(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount)]]
   03)----Projection: lineitem.l_extendedprice * (Decimal128(Some(1),20,0) - 
lineitem.l_discount) AS __common_expr_1, part.p_type
   04)------Inner Join: lineitem.l_partkey = part.p_partkey
   05)--------Projection: lineitem.l_partkey, lineitem.l_extendedprice, 
lineitem.l_discount
   06)----------Filter: lineitem.l_shipdate >= Date32("1995-09-01") AND 
lineitem.l_shipdate < Date32("1995-10-01")
   07)------------TableScan: lineitem projection=[l_partkey, l_extendedprice, 
l_discount, l_shipdate], partial_filters=[lineitem.l_shipdate >= 
Date32("1995-09-01"), lineitem.l_shipdate < Date32("1995-10-01")]
   08)--------TableScan: part projection=[p_partkey, p_type]
   physical_plan
   01)ProjectionExec: expr=[100 * CAST(sum(CASE WHEN part.p_type LIKE 
Utf8("PROMO%") THEN lineitem.l_extendedprice * Int64(1) - lineitem.l_discount 
ELSE Int64(0) END)@0 AS Float64) / CAST(sum(lineitem.l_extendedprice * Int64(1) 
- lineitem.l_discount)@1 AS Float64) as promo_revenue]
   02)--AggregateExec: mode=Final, gby=[], aggr=[sum(CASE WHEN part.p_type LIKE 
Utf8("PROMO%") THEN lineitem.l_extendedprice * Int64(1) - lineitem.l_discount 
ELSE Int64(0) END), sum(lineitem.l_extendedprice * Int64(1) - 
lineitem.l_discount)]
   03)----CoalescePartitionsExec
   04)------AggregateExec: mode=Partial, gby=[], aggr=[sum(CASE WHEN 
part.p_type LIKE Utf8("PROMO%") THEN lineitem.l_extendedprice * Int64(1) - 
lineitem.l_discount ELSE Int64(0) END), sum(lineitem.l_extendedprice * Int64(1) 
- lineitem.l_discount)]
   05)--------ProjectionExec: expr=[l_extendedprice@0 * (Some(1),20,0 - 
l_discount@1) as __common_expr_1, p_type@2 as p_type]
   06)----------CoalesceBatchesExec: target_batch_size=8192
   07)------------HashJoinExec: mode=Partitioned, join_type=Inner, 
on=[(l_partkey@0, p_partkey@0)], projection=[l_extendedprice@1, l_discount@2, 
p_type@4]
   08)--------------CoalesceBatchesExec: target_batch_size=8192
   09)----------------RepartitionExec: partitioning=Hash([l_partkey@0], 4), 
input_partitions=4
   10)------------------CoalesceBatchesExec: target_batch_size=8192
   11)--------------------FilterExec: l_shipdate@3 >= 1995-09-01 AND 
l_shipdate@3 < 1995-10-01, projection=[l_partkey@0, l_extendedprice@1, 
l_discount@2]
   12)----------------------CsvExec: file_groups={4 groups: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:0..18561749],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:18561749..37123498],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:37123498..55685247],
 
[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/lineitem.tbl:55685247..74246996]]},
 projection=[l_partkey, l_extendedprice, l_discount, l_shipdate], 
has_header=false
   13)--------------CoalesceBatchesExec: target_batch_size=8192
   14)----------------RepartitionExec: partitioning=Hash([p_partkey@0], 4), 
input_partitions=4
   15)------------------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
   16)--------------------CsvExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/part.tbl]]}, 
projection=[p_partkey, p_type], has_header=false
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to