[GitHub] [arrow-datafusion] alamb opened a new issue, #5561: Report and compare benchmark runs against two branches

via GitHub Sun, 12 Mar 2023 05:22:39 -0700


alamb opened a new issue, #5561:
URL: https://github.com/apache/arrow-datafusion/issues/5561


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   When we make PRs like @jaylmiller 's 
https://github.com/apache/arrow-datafusion/pull/5292 or #3463  we often want to 
know "does this make existing benchmarks faster / slower". To answer this 
question we would like to:
   1. Run benchmarks on `main`
   2. Run benchmarks on the PR
   3. Compare the results
   
   This workflow is supported well for the criterion based microbenchmarks in 
https://github.com/apache/arrow-datafusion/tree/main/datafusion/core/benches 
(by using criterion directly or using the https://github.com/BurntSushi/critcmp)
   
   However, for the "end to end" benchmarks in 
https://github.com/apache/arrow-datafusion/tree/main/benchmarks there is no 
easy way I know of to do two runs and compare results. 
   
   **Describe the solution you'd like**
   There is a "machine readable" output format generated with the `-o` 
parameter (as shown below)
   
   1. I would like a script that that compares the output of two  benchmark 
runs. Ideally written either in bash or python.
   2. Instructions on how to run the script added to 
https://github.com/apache/arrow-datafusion/tree/main/benchmarks
   
   So the workflow would be 
   
   ### Step 1: to create two or more output files using `-o`:
   ```
   alamb@aal-dev:~/arrow-datafusion2/benchmarks$ cargo run --release --bin tpch 
-- benchmark datafusion --iterations 5 --path ~/tpch_data/parquet_data_SF1 
--format parquet -o main
   ```
   
   This produces files like in 
[benchmarks.zip](https://github.com/apache/arrow-datafusion/files/10950794/benchmarks.zip).
 Here is an example
   
   
   ```json
   {
     "context": {
       "benchmark_version": "19.0.0",
       "datafusion_version": "19.0.0",
       "num_cpus": 8,
       "start_time": 1678622986,
       "arguments": [
         "benchmark",
         "datafusion",
         "--iterations",
         "5",
         "--path",
         "/home/alamb/tpch_data/parquet_data_SF1",
         "--format",
         "parquet",
         "-o",
         "main"
       ]
     },
     "queries": [
       {
         "query": 1,
         "iterations": [
           {
             "elapsed": 1555.030709,
             "row_count": 4
           },
           {
             "elapsed": 1533.61753,
             "row_count": 4
           },
           {
             "elapsed": 1551.0951309999998,
             "row_count": 4
           },
           {
             "elapsed": 1539.953467,
             "row_count": 4
           },
           {
             "elapsed": 1541.992357,
             "row_count": 4
           }
         ],
         "start_time": 1678622986
       },
       ...
   
   ```
   ### Step 2: Compare the two files and prepare a report
   
   ```shell
   benchmarks/compare_results branch.json main.json
   ```
   
   Which would produce an output report of some type. Here is an example  of an 
output output (from @korowa on 
https://github.com/apache/arrow-datafusion/pull/5490#issuecomment-1459826565). 
Maybe they have a script they could share
   
   
   ```
   Query               branch         main
   ----------------------------------------------
   Query 1 avg time:   1047.93 ms     1135.36 ms
   Query 2 avg time:   280.91 ms      286.69 ms
   Query 3 avg time:   323.87 ms      351.31 ms
   Query 4 avg time:   146.87 ms      146.58 ms
   Query 5 avg time:   482.85 ms      463.07 ms
   Query 6 avg time:   274.73 ms      342.29 ms
   Query 7 avg time:   750.73 ms      762.43 ms
   Query 8 avg time:   443.34 ms      426.89 ms
   Query 9 avg time:   821.48 ms      775.03 ms
   Query 10 avg time:  585.21 ms      584.16 ms
   Query 11 avg time:  247.56 ms      232.90 ms
   Query 12 avg time:  258.51 ms      231.19 ms
   Query 13 avg time:  899.16 ms      885.56 ms
   Query 14 avg time:  300.63 ms      282.56 ms
   Query 15 avg time:  346.36 ms      318.97 ms
   Query 16 avg time:  198.33 ms      184.26 ms
   Query 17 avg time:  4197.54 ms     4101.92 ms
   Query 18 avg time:  2726.41 ms     2548.96 ms
   Query 19 avg time:  566.67 ms      535.74 ms
   Query 20 avg time:  1193.82 ms     1319.49 ms
   Query 21 avg time:  1027.00 ms     1050.08 ms
   Query 22 avg time:  120.03 ms      111.32 ms
   ```
   
   
   **Describe alternatives you've considered**
   Another possibility might be to move the specialized benchmark binaries into 
`criterion` (so they look like "microbench"es but I think this is non ideal 
because of the number of parameters supported by the benchmarks
   
   
   **Additional context**
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb opened a new issue, #5561: Report and compare benchmark runs against two branches

Reply via email to