alamb opened a new issue, #7209:
URL: https://github.com/apache/arrow-datafusion/issues/7209

   ### Is your feature request related to a problem or challenge?
   
   Follow on to #7052 
   There is an interesting database benchark called "H20.ai database like 
benchmark" that DuckDB seems to have revived (perhaps because the original went 
dormant with very old with very old/ slow duckdb results). More background 
here: https://duckdb.org/2023/04/14/h2oai.html#results
   
   @Dandandan  added a new solution for datafusion here: 
https://github.com/duckdblabs/db-benchmark/pull/18
   
   However, there is no easy way to run the h2o benchmark within the datafusion 
repo. There is an old version of some of these benchmarks in the code: 
https://github.com/apache/arrow-datafusion/blob/main/benchmarks/src/bin/h2o.rs
   
   
   ### Describe the solution you'd like
   
   I would like someone to make it easy to run the h20.ai benchmark in the 
datafusion repo.
   
   Ideally this would look like
   ```shell
   # generate data
   ./benchmarks/bench.sh data h20.ai
   # run 
   ./benchmarks/bench.sh run h20.ai
   ```
   
   I would expect to be able to run the individual queries like this
   
   ```shell
   cargo run  --bin dfbench -- h2o.ai --query=3
   ```
   
   Some steps might be
   1. port the existing benchmark script to dfbench following the model in 
https://github.com/apache/arrow-datafusion/pull/7120
   2. update `bench.sh`, following the model of existing benchmarks
   3. Update the documentation
   
   
   ### Describe alternatives you've considered
   
   We could also simply remove the h20.ai benchmark script as it is not clear 
how important it will be long term
   
   ### Additional context
   
   I think this is a good first issue as the task is clear, and there are 
existing patterns in `bench.sh`, `dfbench` and in 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to