alamb opened a new issue, #5505: URL: https://github.com/apache/arrow-datafusion/issues/5505
# Call to action: Let's invest more effort in DataFusion benchmarking, both as a mechanism for technical evangelism as well as a guide for actual performance improvements. # Background We have several examples of performance “comparisons” showing DataFusion not doing well against DuckDB or pola.rs that really was a test of how fast CSV or JSON parsing can go ([this blog ](https://www.confessionsofadataguy.com/dataframe-showdown-polars-vs-spark-vs-pandas-vs-datafusion-guess-who-wins/)is one such example) – recent work should make these comparisons much more favorable in the future It is in the interest of all projects based on DataFusion to focus on their own users and use cases rather than having to explain why they are using supposedly "inferior" technology due to misleading benchmark results (for example recently on ClickBench – see https://github.com/apache/arrow-datafusion/issues/5276). Of course not only will improved benchmarking help evangelize DataFusion more, it will also directly help guide the community’s optimization efforts. # Task List - [ ] https://github.com/apache/arrow-datafusion/issues/5276 - [ ] https://github.com/apache/arrow-datafusion/issues/5502 - [ ] https://github.com/apache/arrow-datafusion/issues/5504 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
