[PR] Add "Extended" clickbench queries [arrow-datafusion]

via GitHub Sun, 14 Jan 2024 04:34:28 -0800


alamb opened a new pull request, #8861:
URL: https://github.com/apache/arrow-datafusion/pull/8861


   ## Which issue does this PR close?
   
   Closes https://github.com/apache/arrow-datafusion/issues/8860
   
   ## Rationale for this change
   
   I would like to have benchmarks that allow us to show improvements such as 
https://github.com/apache/arrow-datafusion/pull/8827 and 
https://github.com/apache/arrow-datafusion/pull/8849 are significant 
   
   ## What changes are included in this PR?
   Add new "Extended" datafusion specific clickbench queries:
   
   to run:
   ```shell
   ./benchmarks/bench.sh run clickbench_extended
   ```
   
   Example:
   ```
   ***************************
   DataFusion Benchmark Script
   COMMAND: run
   BENCHMARK: clickbench_extended
   DATAFUSION_DIR: /Users/andrewlamb/Software/arrow-datafusion/benchmarks/..
   BRACH_NAME: alamb_clickbench_extended
   DATA_DIR: /Users/andrewlamb/Software/arrow-datafusion/benchmarks/data
   RESULTS_DIR: 
/Users/andrewlamb/Software/arrow-datafusion/benchmarks/results/alamb_clickbench_extended
   CARGO_COMMAND: cargo run --profile release-nonlto
   ***************************
   RESULTS_FILE: 
/Users/andrewlamb/Software/arrow-datafusion/benchmarks/results/alamb_clickbench_extended/clickbench_extended.json
   Running clickbench (1 file) extended benchmark...
      Compiling datafusion-benchmarks v34.0.0 
(/Users/andrewlamb/Software/arrow-datafusion/benchmarks)
        Running 
`/Users/andrewlamb/Software/arrow-datafusion/target/release-nonlto/dfbench 
clickbench --iterations 5 --path /Users/andrewlamb/Software/ar
   row-datafusion/benchmarks/data/hits.parquet --queries-path 
/Users/andrewlamb/Software/arrow-datafusion/benchmarks/queries/clickbench/extended.sql
 -o 
/Users/andrewlamb/Software/arrow-datafusion/benchmarks/results/alamb_clickbench_extended/clickbench_extended.json`
   Running benchmarks with the following options: RunOpt { query: None, common: 
CommonOpt { iterations: 5, partitions: None, batch_size: 8192, debug: false }, 
path: 
"/Users/andrewlamb/Software/arrow-datafusion/benchmarks/data/hits.parquet", 
queries_path: 
"/Users/andrewlamb/Software/arrow-datafusion/benchmarks/queries/clickbench/extended.sql",
 output_path: 
Some("/Users/andrewlamb/Software/arrow-datafusion/benchmarks/results/alamb_clickbench_extended/clickbench_extended.json")
 }
   Q0: SELECT COUNT(DISTINCT "SearchPhrase"), COUNT(DISTINCT "MobilePhone"), 
COUNT(DISTINCT "MobilePhoneModel") FROM hits;
   Query 0 iteration 0 took 5614.0 ms and returned 1 rows
   Query 0 iteration 1 took 5652.6 ms and returned 1 rows
   Query 0 iteration 2 took 5554.3 ms and returned 1 rows
   Query 0 iteration 3 took 5511.4 ms and returned 1 rows
   Query 0 iteration 4 took 5554.3 ms and returned 1 rows
   Done
   ```
   
   ## Are these changes tested?
   I tested this (and clickbench_1) manually 
   
   ## Are there any user-facing changes?
   this is a development tool only 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Add "Extended" clickbench queries [arrow-datafusion]

Reply via email to