alamb commented on issue #7000:
URL: https://github.com/apache/datafusion/issues/7000#issuecomment-2706466646

   Thanks @IshaGudewar !
   
   What I would suggest starting on is familiarize yourself with the 
ClickeBench benchmark / and how to run them
   - https://github.com/apache/datafusion/issues/14586
   
   Maybe start with a goal of understanding what DuckDB is doing with Q24 and 
Q26 that is 10x faster than DataFusion
   
   In general profiling the queries and looking at what is taking time is 
likely quite valuable
   
   
   Other ideas
   
   The last major things I know of to improve grouping performance would be
   - https://github.com/apache/datafusion/issues/9562
   
   @Rachelint had some great results in the following PR, but the complexity 
was getting away from ys
   - https://github.com/apache/datafusion/pull/11943
   
   In other words, that is likely not a great first issue
   
   
   More foundational work would be this project (but it will require some low 
level, fiddly code in arrow-rs(
   - https://github.com/apache/datafusion/issues/7957
   - https://github.com/apache/arrow-rs/issues/6692


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to