Rachelint commented on issue #11680:
URL: https://github.com/apache/datafusion/issues/11680#issuecomment-2328012643

   > > It seems the cpu cost about RepartitionExec and CoalesceBatchesExec is 
not the bottleleck for the Q32
   > 
   > What @jayzhan211 experiments and shows the effects of single aggregate 
performance benefits in #11762 and #11777 is on Clickbench Q17/Q18 instead of 
Q32.
   > 
   > As of today, I see that Q32 performance is comparable to that in DuckDB on 
an M3 Mac.
   > 
   > ```
   > # DuckDB Q32:
   > 0.5369797919993289
   > 0.44854350000969134
   > 0.41927954100538045
   > 
   > # DataFusion main(780cccb52)
   > 0.620
   > 0.400
   > 0.409
   > ```
   > 
   > But for Q17, we are still behind:
   > 
   > ```
   > # DuckDB
   > 0.5953990409907419
   > 0.5309897500119405
   > 0.5242392499931157
   > 
   > # DataFusion main(780cccb52)
   > 1.145
   > 1.072
   > 1.082
   > ```
   > 
   > We would probably need to consolidate Aggregate(Partial and Final) and 
Repartition into a single place in order to be able to adaptively choose 
aggregate mode/algorithm based on runtime statistics.
   
   I see the improvement about q32 in later pr #11792, and I guess the reason 
why performance improved may be simlar as the partial skipping? Maybe q17/q18 
are improved due to different reason with q32?
   
   I agree maybe we should perform the similar mechanism about select the 
merging mode dynamicly like `duckdb`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to