Omega359 opened a new pull request, #23052:
URL: https://github.com/apache/datafusion/pull/23052

   ## Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and 
enhancements and this helps us generate change logs for our releases. You can 
link an issue to this PR using the GitHub syntax. For example `Closes #123` 
indicates that this PR will close issue #123.
   -->
   
   - Part of 21937.
   
   ## Rationale for this change
   
   Running sql benchmarks using environment variables for configuration is 
awkward and error prone and strictly using criterion, while statistically much 
better, is quite slow compared to using simple iterations.
   
   This PR is the first version of a benchmark runner for sql benchmarks that 
will eventually use arguments for all benchmark configuration options.
   
   ## What changes are included in this PR?
   
   A simple benchmark runner that can list out the sql benchmarks and run a 
benchmark using iterations or criterion allowing for specifying a single query 
if desired. 
   
   Future enhancements will use arguments for benchmark configuration vs just 
using environment variables as well as providing help and tying this into 
bench.sh
   
   ## Are these changes tested?
   
   Yes. I have a script that tests all current sql benchmarks both with and 
without criterion. Here is an portion of it for the single clickbench benchmark:
   ```
   # clickbench single basic long flags
   env DATA_DIR=data CLICKBENCH_TYPE=single cargo run -p datafusion-benchmarks 
--bin benchmark_runner -- clickbench --query 0 --iterations 5 --output 
results/benchmark_runner/clickbench_single_long.json
   
   # clickbench single basic short flags
   env DATA_DIR=data CLICKBENCH_TYPE=single cargo run -p datafusion-benchmarks 
--bin benchmark_runner -- clickbench --query 0 -i 5 -o 
results/benchmark_runner/clickbench_single_short.json
   
   # clickbench single basic env iterations
   env DATA_DIR=data CLICKBENCH_TYPE=single ITERATIONS=5 cargo run -p 
datafusion-benchmarks --bin benchmark_runner -- clickbench --query 0 --output 
results/benchmark_runner/clickbench_single_env_iterations.json
   
   # clickbench single criterion with baseline
   env DATA_DIR=data CLICKBENCH_TYPE=single cargo run -p datafusion-benchmarks 
--bin benchmark_runner -- clickbench --query 0 --criterion --save-baseline 
benchmark_runner_acceptance
   
   # clickbench single criterion without baseline
   env DATA_DIR=data CLICKBENCH_TYPE=single cargo run -p datafusion-benchmarks 
--bin benchmark_runner -- clickbench --query 0 --criterion
   ```
   The existing `cargo bench` approach still works the same (criterion only):
   ```
   env DATA_DIR=data CLICKBENCH_TYPE=single BENCH_NAME=clickbench BENCH_QUERY=0 
cargo bench -p datafusion-benchmarks --bench sql`
   ```
   
   ## Are there any user-facing changes?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to