Omega359 opened a new pull request, #23052: URL: https://github.com/apache/datafusion/pull/23052
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Part of 21937. ## Rationale for this change Running sql benchmarks using environment variables for configuration is awkward and error prone and strictly using criterion, while statistically much better, is quite slow compared to using simple iterations. This PR is the first version of a benchmark runner for sql benchmarks that will eventually use arguments for all benchmark configuration options. ## What changes are included in this PR? A simple benchmark runner that can list out the sql benchmarks and run a benchmark using iterations or criterion allowing for specifying a single query if desired. Future enhancements will use arguments for benchmark configuration vs just using environment variables as well as providing help and tying this into bench.sh ## Are these changes tested? Yes. I have a script that tests all current sql benchmarks both with and without criterion. Here is an portion of it for the single clickbench benchmark: ``` # clickbench single basic long flags env DATA_DIR=data CLICKBENCH_TYPE=single cargo run -p datafusion-benchmarks --bin benchmark_runner -- clickbench --query 0 --iterations 5 --output results/benchmark_runner/clickbench_single_long.json # clickbench single basic short flags env DATA_DIR=data CLICKBENCH_TYPE=single cargo run -p datafusion-benchmarks --bin benchmark_runner -- clickbench --query 0 -i 5 -o results/benchmark_runner/clickbench_single_short.json # clickbench single basic env iterations env DATA_DIR=data CLICKBENCH_TYPE=single ITERATIONS=5 cargo run -p datafusion-benchmarks --bin benchmark_runner -- clickbench --query 0 --output results/benchmark_runner/clickbench_single_env_iterations.json # clickbench single criterion with baseline env DATA_DIR=data CLICKBENCH_TYPE=single cargo run -p datafusion-benchmarks --bin benchmark_runner -- clickbench --query 0 --criterion --save-baseline benchmark_runner_acceptance # clickbench single criterion without baseline env DATA_DIR=data CLICKBENCH_TYPE=single cargo run -p datafusion-benchmarks --bin benchmark_runner -- clickbench --query 0 --criterion ``` The existing `cargo bench` approach still works the same (criterion only): ``` env DATA_DIR=data CLICKBENCH_TYPE=single BENCH_NAME=clickbench BENCH_QUERY=0 cargo bench -p datafusion-benchmarks --bench sql` ``` ## Are there any user-facing changes? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
