alamb commented on issue #7000: URL: https://github.com/apache/datafusion/issues/7000#issuecomment-2706466646
Thanks @IshaGudewar ! What I would suggest starting on is familiarize yourself with the ClickeBench benchmark / and how to run them - https://github.com/apache/datafusion/issues/14586 Maybe start with a goal of understanding what DuckDB is doing with Q24 and Q26 that is 10x faster than DataFusion In general profiling the queries and looking at what is taking time is likely quite valuable Other ideas The last major things I know of to improve grouping performance would be - https://github.com/apache/datafusion/issues/9562 @Rachelint had some great results in the following PR, but the complexity was getting away from ys - https://github.com/apache/datafusion/pull/11943 In other words, that is likely not a great first issue More foundational work would be this project (but it will require some low level, fiddly code in arrow-rs( - https://github.com/apache/datafusion/issues/7957 - https://github.com/apache/arrow-rs/issues/6692 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
