jinchengchenghh opened a new issue, #9543:
URL: https://github.com/apache/incubator-gluten/issues/9543

   ### Description
   
   For count(distinct a), sum(b) like TPCDS Q95, Now spark plan is group by and 
then aggregate, but velox can support aggregate.distinct = true in 
HashAggregation, here may have some potential to optimize.
   ```
   Calling CudfHashJoinProbe::getOutput
   I20250506 14:04:44.365132  5450 WholeStageResultIterator.cc:354] Native Plan 
with stats for: [Stage: 35 TID: 90]
   -- Aggregation[21][PARTIAL n21_0 := sum_merge("n20_1"), n21_1 := 
sum_merge("n20_2"), n21_2 := count_partial("n18_5")] -> n21_0:DOUBLE, 
n21_1:DOUBLE, n21_2:BIGINT
      Output: 1 rows (24B, 1 batches), Cpu time: 599.29us, Wall time: 2.19ms, 
Blocked wall time: 0ns, Peak memory: 0B, Memory allocations: 0, Threads: 1, CPU 
breakdown: B/I/O/F (81.71us/3.75us/485.23us/28.60us)
         runningAddInputWallNanos     sum: 4.60us, count: 1, min: 4.60us, max: 
4.60us, avg: 4.60us
         runningFinishWallNanos       sum: 104.96us, count: 1, min: 104.96us, 
max: 104.96us, avg: 104.96us
         runningGetOutputWallNanos    sum: 1.65ms, count: 1, min: 1.65ms, max: 
1.65ms, avg: 1.65ms
     -- Aggregation[20][SINGLE [n18_5] n20_1 := sum_merge("n19_1"), n20_2 := 
sum_merge("n19_2")] -> n18_5:BIGINT, n20_1:DOUBLE, n20_2:DOUBLE
        Output: 8 rows (256B, 1 batches), Cpu time: 749.37us, Wall time: 
2.39ms, Blocked wall time: 0ns, Peak memory: 0B, Memory allocations: 0, 
Threads: 1, CPU breakdown: B/I/O/F (81.36us/3.65us/640.41us/23.95us)
           queuedWallNanos              sum: 2.00us, count: 1, min: 2.00us, 
max: 2.00us, avg: 2.00us
           runningAddInputWallNanos     sum: 4.63us, count: 1, min: 4.63us, 
max: 4.63us, avg: 4.63us
           runningFinishWallNanos       sum: 68.09us, count: 1, min: 68.09us, 
max: 68.09us, avg: 68.09us
           runningGetOutputWallNanos    sum: 2.09ms, count: 1, min: 2.09ms, 
max: 2.09ms, avg: 2.09ms
       -- Aggregation[19][SINGLE [n18_5] n19_1 := sum_partial("n18_6"), n19_2 
:= sum_partial("n18_7")] -> n18_5:BIGINT, n19_1:DOUBLE, n19_2:DOUBLE
          Output: 8 rows (256B, 1 batches), Cpu time: 823.88us, Wall time: 
2.57ms, Blocked wall time: 0ns, Peak memory: 0B, Memory allocations: 0, 
Threads: 1, CPU breakdown: B/I/O/F (56.28us/17.21us/723.50us/26.89us)
   ```
   
   ### Gluten version
   
   None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to