alamb commented on issue #9404: URL: https://github.com/apache/arrow-datafusion/issues/9404#issuecomment-1986394436
> @alamb @kmitchener I went ahead and ran the benchmarks against 36.0.0 - the results are available [on this branch](https://github.com/pmcgleenon/ClickBench/tree/datafusion-36/datafusion) Thank you very much @pmcgleenon > I accidentally created a PR against the ClickBench repo. I was hoping to get your feedback on these results first before doing that.... I took a quick look at the results and in general they looked reasonable to me One thing I noticed is that there didn't seem to be a DataFusion 36 run for single file parquet (only partitioned). Also, I wonder if we should remove the older datafusion versions 🤔 ## Initial performance observation The only query that seems to have gotten substantially faster is Q9: I think the improvement is due to https://github.com/apache/arrow-datafusion/pull/8721 from @korowa It was hoping to see a bigger improvement due to https://github.com/apache/arrow-datafusion/pull/8827 but it seems that has not yet been released. So we'll have to try with DataFusion 37 again, There are some queries that show slight degregation in speed -- I am not sure if that is realted to measurement variance or if we have increased our per partition file overheard or something. It would be nice to see the numbers for single file and see if they showed a similar pattern -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
