Great idea Andy. I couldn't see any of the feedback, btw, nor could I add any comments in the doc :(
Anyway here are some similar efforts (Big data/SQL benchmarks) from other communities - Trino: https://github.com/trinodb/tpcds, https://github.com/trinodb/tpch Spark - https://github.com/databricks/spark-sql-perf AmpLab - https://amplab.cs.berkeley.edu/benchmark/ And some interesting reads - https://www.cs.cmu.edu/~pavlo/papers/benchmarks-sigmod09.pdf https://www.vldb.org/pvldb/vol13/p3285-gruenheid.pdf On Tue, May 14, 2024 at 6:16 AM Andy Grove <andy.gr...@apple.com.invalid> wrote: > Thank you for the feedback on the proposal, which has all been positive. > > I have now created the repository, and I plan on creating some PRs this > week to add some initial documentation and scripts. > > [image: datafusion-benchmarks.png] > > apache/datafusion-benchmarks: Apache DataFusion Benchmarks > <https://github.com/apache/datafusion-benchmarks> > github.com <https://github.com/apache/datafusion-benchmarks> > <https://github.com/apache/datafusion-benchmarks> > > > Thanks, > > Andy. > > > > On May 12, 2024, at 8:54 AM, Andy Grove <andy.gr...@apple.com.INVALID> > wrote: > > Hello, > > I would like to propose creating a new datafusion-benchmarks repository > for shared documentation and scripts that can help with benchmarking > efforts across DataFusion and its subprojects. Please let me know your > thoughts in the attached Google document. > > Thanks, > > Andy. > > > > <AHkbwyIW9HgfgLSDA-QWApSQdU0L2nzsW_F65WGECeMes0pm8Dea4beaF26W2NQfPq9dnaK5MsRfmL2L0-uJSiNPa0blHfrU_5rdnzRsshju2pHxREZ63C12=w1200-h630-p.png> > > DataFusion Benchmarking Repository Proposal > <https://docs.google.com/document/d/17qh8ydqlJfR9_7eZ5HEsKu7xQRw-GPhFCPtZqAHAPak/edit?usp=sharing> > docs.google.com > <https://docs.google.com/document/d/17qh8ydqlJfR9_7eZ5HEsKu7xQRw-GPhFCPtZqAHAPak/edit?usp=sharing> > > <https://docs.google.com/document/d/17qh8ydqlJfR9_7eZ5HEsKu7xQRw-GPhFCPtZqAHAPak/edit?usp=sharing> > > > >