Re: [I] Update ClickBench benchmarks with DataFusion 36 [arrow-datafusion]

via GitHub Fri, 08 Mar 2024 12:48:41 -0800


alamb commented on issue #9404:
URL: 
https://github.com/apache/arrow-datafusion/issues/9404#issuecomment-1986394436


   > @alamb @kmitchener I went ahead and ran the benchmarks against 36.0.0 - 
the results are available [on this 
branch](https://github.com/pmcgleenon/ClickBench/tree/datafusion-36/datafusion)
   
   Thank you very much @pmcgleenon  
   
   > I accidentally created a PR against the ClickBench repo. I was hoping to 
get your feedback on these results first before doing that....
   
   I took a quick look at the results and in general they looked reasonable to 
me
   
   One thing I noticed is that there didn't seem to be a DataFusion 36 run for 
single file parquet (only partitioned). 
   
   Also, I wonder if we should remove the older datafusion versions 🤔 
   
   ## Initial performance observation
   
   The only query that seems to have gotten substantially faster is Q9: I think 
the improvement is due to https://github.com/apache/arrow-datafusion/pull/8721 
from @korowa 
   
   It was hoping to see a bigger improvement due to 
https://github.com/apache/arrow-datafusion/pull/8827 but it seems that has not 
yet been released. So we'll have to try with DataFusion 37 again, 
   
   There are some queries that show slight degregation in speed -- I am not 
sure if that is realted to measurement variance or if we have increased our per 
partition file overheard or something. It would be nice to see the numbers for 
single file and see if they showed a similar pattern


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Update ClickBench benchmarks with DataFusion 36 [arrow-datafusion]

Reply via email to