Great idea Andy.
I couldn't see any of the feedback, btw, nor could I add any comments in
the doc :(

Anyway here are some similar efforts (Big data/SQL benchmarks) from other
communities -
Trino: https://github.com/trinodb/tpcds, https://github.com/trinodb/tpch
Spark - https://github.com/databricks/spark-sql-perf
AmpLab - https://amplab.cs.berkeley.edu/benchmark/

And some interesting reads -
https://www.cs.cmu.edu/~pavlo/papers/benchmarks-sigmod09.pdf
https://www.vldb.org/pvldb/vol13/p3285-gruenheid.pdf


On Tue, May 14, 2024 at 6:16 AM Andy Grove <andy.gr...@apple.com.invalid>
wrote:

> Thank you for the feedback on the proposal, which has all been positive.
>
> I have now created the repository, and I plan on creating some PRs this
> week to add some initial documentation and scripts.
>
> [image: datafusion-benchmarks.png]
>
> apache/datafusion-benchmarks: Apache DataFusion Benchmarks
> <https://github.com/apache/datafusion-benchmarks>
> github.com <https://github.com/apache/datafusion-benchmarks>
> <https://github.com/apache/datafusion-benchmarks>
>
>
> Thanks,
>
> Andy.
>
>
>
> On May 12, 2024, at 8:54 AM, Andy Grove <andy.gr...@apple.com.INVALID>
> wrote:
>
> Hello,
>
> I would like to propose creating a new datafusion-benchmarks repository
> for shared documentation and scripts that can help with benchmarking
> efforts across DataFusion and its subprojects. Please let me know your
> thoughts in the attached Google document.
>
> Thanks,
>
> Andy.
>
>
>
> <AHkbwyIW9HgfgLSDA-QWApSQdU0L2nzsW_F65WGECeMes0pm8Dea4beaF26W2NQfPq9dnaK5MsRfmL2L0-uJSiNPa0blHfrU_5rdnzRsshju2pHxREZ63C12=w1200-h630-p.png>
>
> DataFusion Benchmarking Repository Proposal
> <https://docs.google.com/document/d/17qh8ydqlJfR9_7eZ5HEsKu7xQRw-GPhFCPtZqAHAPak/edit?usp=sharing>
> docs.google.com
> <https://docs.google.com/document/d/17qh8ydqlJfR9_7eZ5HEsKu7xQRw-GPhFCPtZqAHAPak/edit?usp=sharing>
>
> <https://docs.google.com/document/d/17qh8ydqlJfR9_7eZ5HEsKu7xQRw-GPhFCPtZqAHAPak/edit?usp=sharing>
>
>
>
>

Reply via email to