[GitHub] [arrow-rs] ozgrakkurt commented on pull request #4441: perf(parquet): use optimized bloom filter

via GitHub Tue, 27 Jun 2023 15:38:51 -0700


ozgrakkurt commented on PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#issuecomment-1610313921


   On `aarch64`
   
   <details>
   
   <summary>Running `cargo bench` under parquet folder, first with master 
branch and then sbbf branch</summary>
   
   ```
   Benchmarking write_batch primitive/4096 values primitive: Warming up for 
3.0000 Benchmarking write_batch primitive/4096 values primitive: Collecting 100 
sampleswrite_batch primitive/4096 values primitive
                           time:   [861.79 µs 873.72 µs 885.48 µs]
                           thrpt:  [198.69 MiB/s 201.36 MiB/s 204.15 MiB/s]
                    change:
                           time:   [+2.1325% +3.2924% +4.6073%] (p = 0.00 < 
0.05)
                           thrpt:  [-4.4044% -3.1875% -2.0880%]
                           Performance has regressed.
   Benchmarking write_batch primitive/4096 values primitive with bloom filter: 
WarmBenchmarking write_batch primitive/4096 values primitive with bloom filter: 
CollBenchmarking write_batch primitive/4096 values primitive with bloom filter: 
Analwrite_batch primitive/4096 values primitive with bloom filter
                           time:   [3.8160 ms 3.8739 ms 3.9339 ms]
                           thrpt:  [44.722 MiB/s 45.415 MiB/s 46.105 MiB/s]
                    change:
                           time:   [-32.210% -31.013% -29.875%] (p = 0.00 < 
0.05)
                           thrpt:  [+42.603% +44.955% +47.515%]
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   Benchmarking write_batch primitive/4096 values primitive non-null: Warming 
up foBenchmarking write_batch primitive/4096 values primitive non-null: 
Collecting 10write_batch primitive/4096 values primitive non-null
                           time:   [734.01 µs 739.56 µs 746.42 µs]
                           thrpt:  [231.12 MiB/s 233.27 MiB/s 235.03 MiB/s]
                    change:
                           time:   [+2.7312% +3.4277% +4.2323%] (p = 0.00 < 
0.05)
                           thrpt:  [-4.0604% -3.3141% -2.6586%]
                           Performance has regressed.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   Benchmarking write_batch primitive/4096 values primitive non-null with bloom 
filBenchmarking write_batch primitive/4096 values primitive non-null with bloom 
filBenchmarking write_batch primitive/4096 values primitive non-null with bloom 
filBenchmarking write_batch primitive/4096 values primitive non-null with bloom 
filwrite_batch primitive/4096 values primitive non-null with bloom filter
                           time:   [3.3841 ms 3.4255 ms 3.4691 ms]
                           thrpt:  [49.729 MiB/s 50.362 MiB/s 50.978 MiB/s]
                    change:
                           time:   [-39.074% -38.127% -37.236%] (p = 0.00 < 
0.05)
                           thrpt:  [+59.327% +61.621% +64.133%]
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     6 (6.00%) high mild
   Benchmarking write_batch primitive/4096 values bool: Collecting 100 samples 
in estimated 5.3174 s (write_batch primitive/4096 values bool
                           time:   [98.046 µs 98.965 µs 100.33 µs]
                           thrpt:  [10.570 MiB/s 10.716 MiB/s 10.816 MiB/s]
                    change:
                           time:   [+2.9694% +3.9493% +5.1195%] (p = 0.00 < 
0.05)
                           thrpt:  [-4.8702% -3.7992% -2.8837%]
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     2 (2.00%) high mild
     2 (2.00%) high severe
   Benchmarking write_batch primitive/4096 values bool non-null: Collecting 100 
samples in estimated 5write_batch primitive/4096 values bool non-null
                           time:   [78.075 µs 79.162 µs 80.149 µs]
                           thrpt:  [7.1393 MiB/s 7.2282 MiB/s 7.3289 MiB/s]
                    change:
                           time:   [+0.0433% +2.4260% +5.4192%] (p = 0.07 > 
0.05)
                           thrpt:  [-5.1406% -2.3685% -0.0432%]
                           No change in performance detected.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high severe
   Benchmarking write_batch primitive/4096 values string: Collecting 100 
samples in estimated 6.2646 swrite_batch primitive/4096 values string
                           time:   [403.61 µs 405.73 µs 408.72 µs]
                           thrpt:  [194.39 MiB/s 195.83 MiB/s 196.85 MiB/s]
                    change:
                           time:   [+1.4492% +2.3498% +3.5335%] (p = 0.00 < 
0.05)
                           thrpt:  [-3.4129% -2.2959% -1.4285%]
                           Performance has regressed.
   Found 12 outliers among 100 measurements (12.00%)
     8 (8.00%) high mild
     4 (4.00%) high severe
   Benchmarking write_batch primitive/4096 values string with bloom filter: 
Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 5.2s, enable flat sampling, or reduce sample count to 60.
   Benchmarking write_batch primitive/4096 values string with bloom filter: 
Collecting 100 samples in write_batch primitive/4096 values string with bloom 
filter
                           time:   [1.1055 ms 1.1382 ms 1.1729 ms]
                           thrpt:  [67.741 MiB/s 69.804 MiB/s 71.873 MiB/s]
                    change:
                           time:   [-37.247% -35.553% -33.810%] (p = 0.00 < 
0.05)
                           thrpt:  [+51.079% +55.167% +59.354%]
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   Benchmarking write_batch primitive/4096 values string dictionary: Collecting 
100 samples in estimatwrite_batch primitive/4096 values string dictionary
                           time:   [229.68 µs 230.35 µs 231.20 µs]
                           thrpt:  [207.00 MiB/s 207.77 MiB/s 208.37 MiB/s]
                    change:
                           time:   [+0.2089% +1.0188% +2.0040%] (p = 0.02 < 
0.05)
                           thrpt:  [-1.9646% -1.0085% -0.2085%]
                           Change within noise threshold.
   Found 13 outliers among 100 measurements (13.00%)
     4 (4.00%) high mild
     9 (9.00%) high severe
   Benchmarking write_batch primitive/4096 values string dictionary with bloom 
filter: Warming up for Benchmarking write_batch primitive/4096 values string 
dictionary with bloom filter: Collecting 100 write_batch primitive/4096 values 
string dictionary with bloom filter
                           time:   [535.90 µs 545.12 µs 555.73 µs]
                           thrpt:  [86.120 MiB/s 87.796 MiB/s 89.305 MiB/s]
                    change:
                           time:   [-42.792% -40.776% -38.804%] (p = 0.00 < 
0.05)
                           thrpt:  [+63.410% +68.850% +74.801%]
                           Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
     3 (3.00%) high mild
     4 (4.00%) high severe
   Benchmarking write_batch primitive/4096 values string non-null: Collecting 
100 samples in estimatedwrite_batch primitive/4096 values string non-null
                           time:   [483.04 µs 485.05 µs 487.48 µs]
                           thrpt:  [160.98 MiB/s 161.79 MiB/s 162.46 MiB/s]
                    change:
                           time:   [-2.2902% -1.5725% -0.8595%] (p = 0.00 < 
0.05)
                           thrpt:  [+0.8669% +1.5976% +2.3439%]
                           Change within noise threshold.
   Found 16 outliers among 100 measurements (16.00%)
     6 (6.00%) high mild
     10 (10.00%) high severe
   Benchmarking write_batch primitive/4096 values string non-null with bloom 
filter: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 5.8s, enable flat sampling, or reduce sample count to 60.
   Benchmarking write_batch primitive/4096 values string non-null with bloom 
filter: Collecting 100 sawrite_batch primitive/4096 values string non-null with 
bloom filter
                           time:   [1.1462 ms 1.1802 ms 1.2178 ms]
                           thrpt:  [64.442 MiB/s 66.495 MiB/s 68.466 MiB/s]
                    change:
                           time:   [-39.081% -37.298% -35.342%] (p = 0.00 < 
0.05)
                           thrpt:  [+54.661% +59.483% +64.152%]
                           Performance has improved.
   Found 9 outliers among 100 measurements (9.00%)
     7 (7.00%) high mild
     2 (2.00%) high severe
   
   Benchmarking write_batch nested/4096 values primitive list: Warming up for 
3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 5.7s, enable flat sampling, or reduce sample count to 60.
   Benchmarking write_batch nested/4096 values primitive list: Collecting 100 
samples in estimated 5.7write_batch nested/4096 values primitive list
                           time:   [1.1292 ms 1.1307 ms 1.1322 ms]
                           thrpt:  [144.23 MiB/s 144.42 MiB/s 144.60 MiB/s]
                    change:
                           time:   [-1.3286% -0.8860% -0.4412%] (p = 0.00 < 
0.05)
                           thrpt:  [+0.4431% +0.8939% +1.3465%]
                           Change within noise threshold.
   Found 6 outliers among 100 measurements (6.00%)
     1 (1.00%) low mild
     3 (3.00%) high mild
     2 (2.00%) high severe
   Benchmarking write_batch nested/4096 values primitive list non-null: Warming 
up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 7.1s, enable flat sampling, or reduce sample count to 50.
   Benchmarking write_batch nested/4096 values primitive list non-null: 
Collecting 100 samples in estiwrite_batch nested/4096 values primitive list 
non-null
                           time:   [1.4193 ms 1.4229 ms 1.4272 ms]
                           thrpt:  [133.14 MiB/s 133.54 MiB/s 133.88 MiB/s]
                    change:
                           time:   [+4.4098% +4.7487% +5.1370%] (p = 0.00 < 
0.05)
                           thrpt:  [-4.8860% -4.5335% -4.2235%]
                           Performance has regressed.
   Found 7 outliers among 100 measurements (7.00%)
     3 (3.00%) low mild
     1 (1.00%) high mild
     3 (3.00%) high severe
   
   
   ```
   
   </details>
   
   So something like %35 gain across the board.
   I'll also test on x86 machine in a little bit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] ozgrakkurt commented on pull request #4441: perf(parquet): use optimized bloom filter

Reply via email to