alamb opened a new pull request, #8861: URL: https://github.com/apache/arrow-datafusion/pull/8861
## Which issue does this PR close? Closes https://github.com/apache/arrow-datafusion/issues/8860 ## Rationale for this change I would like to have benchmarks that allow us to show improvements such as https://github.com/apache/arrow-datafusion/pull/8827 and https://github.com/apache/arrow-datafusion/pull/8849 are significant ## What changes are included in this PR? Add new "Extended" datafusion specific clickbench queries: to run: ```shell ./benchmarks/bench.sh run clickbench_extended ``` Example: ``` *************************** DataFusion Benchmark Script COMMAND: run BENCHMARK: clickbench_extended DATAFUSION_DIR: /Users/andrewlamb/Software/arrow-datafusion/benchmarks/.. BRACH_NAME: alamb_clickbench_extended DATA_DIR: /Users/andrewlamb/Software/arrow-datafusion/benchmarks/data RESULTS_DIR: /Users/andrewlamb/Software/arrow-datafusion/benchmarks/results/alamb_clickbench_extended CARGO_COMMAND: cargo run --profile release-nonlto *************************** RESULTS_FILE: /Users/andrewlamb/Software/arrow-datafusion/benchmarks/results/alamb_clickbench_extended/clickbench_extended.json Running clickbench (1 file) extended benchmark... Compiling datafusion-benchmarks v34.0.0 (/Users/andrewlamb/Software/arrow-datafusion/benchmarks) Running `/Users/andrewlamb/Software/arrow-datafusion/target/release-nonlto/dfbench clickbench --iterations 5 --path /Users/andrewlamb/Software/ar row-datafusion/benchmarks/data/hits.parquet --queries-path /Users/andrewlamb/Software/arrow-datafusion/benchmarks/queries/clickbench/extended.sql -o /Users/andrewlamb/Software/arrow-datafusion/benchmarks/results/alamb_clickbench_extended/clickbench_extended.json` Running benchmarks with the following options: RunOpt { query: None, common: CommonOpt { iterations: 5, partitions: None, batch_size: 8192, debug: false }, path: "/Users/andrewlamb/Software/arrow-datafusion/benchmarks/data/hits.parquet", queries_path: "/Users/andrewlamb/Software/arrow-datafusion/benchmarks/queries/clickbench/extended.sql", output_path: Some("/Users/andrewlamb/Software/arrow-datafusion/benchmarks/results/alamb_clickbench_extended/clickbench_extended.json") } Q0: SELECT COUNT(DISTINCT "SearchPhrase"), COUNT(DISTINCT "MobilePhone"), COUNT(DISTINCT "MobilePhoneModel") FROM hits; Query 0 iteration 0 took 5614.0 ms and returned 1 rows Query 0 iteration 1 took 5652.6 ms and returned 1 rows Query 0 iteration 2 took 5554.3 ms and returned 1 rows Query 0 iteration 3 took 5511.4 ms and returned 1 rows Query 0 iteration 4 took 5554.3 ms and returned 1 rows Done ``` ## Are these changes tested? I tested this (and clickbench_1) manually ## Are there any user-facing changes? this is a development tool only -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
