Re: [PR] Undo run end filter performance regression [arrow-rs]

via GitHub Sat, 09 Nov 2024 06:52:37 -0800


delamarch3 commented on PR #6691:
URL: https://github.com/apache/arrow-rs/pull/6691#issuecomment-2466246617


   I've run the `filter_kernel` benchmark I added for the run array in 
https://github.com/apache/arrow-rs/pull/6706 with the different approaches, 
here are the results I get:
   
   ```rust
   for pred in filter_values
       .iter()
       .skip(start as usize)
       .take((end - start) as usize)
   {
       count += R::Native::from(pred);
       keep |= pred
   }
   ```
   ```text
   Benchmarking filter run array (kept 1/2): Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 52.1s, or reduce sample count to 10.
   filter run array (kept 1/2)
                           time:   [542.98 ms 549.50 ms 556.59 ms]
   Found 10 outliers among 100 measurements (10.00%)
     7 (7.00%) high mild
     3 (3.00%) high severe
   
   Benchmarking filter run array high selectivity (kept 1023/1024): Warming up 
for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 54.3s, or reduce sample count to 10.
   Benchmarking filter run array high selectivity (kept 1023/1024): Collecting 
100 samples in estimated 54.256 s (100 iterations
   filter run array high selectivity (kept 1023/1024)
                           time:   [550.25 ms 555.80 ms 561.74 ms]
   Found 4 outliers among 100 measurements (4.00%)
     4 (4.00%) high mild
   
   Benchmarking filter run array low selectivity (kept 1/1024): Warming up for 
3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 53.5s, or reduce sample count to 10.
   filter run array low selectivity (kept 1/1024)
                           time:   [536.14 ms 540.44 ms 545.14 ms]
   Found 11 outliers among 100 measurements (11.00%)
     6 (6.00%) high mild
     5 (5.00%) high severe
   ```
   
   ```rust
   for _ in start..end {
       if let Some(pred) = preds.next() {
           count += R::Native::from(pred);
           keep |= pred
       }
   }
   ```
   ```text
   filter run array (kept 1/2)
                           time:   [598.70 µs 601.93 µs 605.25 µs]
                           change: [-99.892% -99.890% -99.889%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high severe
   
   Benchmarking filter run array high selectivity (kept 1023/1024): Collecting 
100 samples in estimated 6.0573 s (15k iterations
   filter run array high selectivity (kept 1023/1024)
                           time:   [386.55 µs 388.17 µs 389.91 µs]
                           change: [-99.931% -99.930% -99.929%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     2 (2.00%) high mild
     2 (2.00%) high severe
   
   filter run array low selectivity (kept 1/1024)
                           time:   [239.93 µs 240.46 µs 241.04 µs]
                           change: [-99.956% -99.955% -99.955%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 12 outliers among 100 measurements (12.00%)
     6 (6.00%) high mild
     6 (6.00%) high severe
   ```
   
   These two are similar but after running a few times the low selectivity 
benchmark seems slightly faster in this one
   ```rust
   end -= end.saturating_sub(filter_values.len() as u64);
   for pred in (start..end).map(|i| unsafe { filter_values.value_unchecked(i as 
usize) }) {
       count += R::Native::from(pred);
       keep |= pred
   }
   ```
   ```text
   filter run array (kept 1/2)
                           time:   [581.12 µs 584.01 µs 586.90 µs]
                           change: [-2.5195% -1.1178% +0.1036%] (p = 0.11 > 
0.05)
                           No change in performance detected.
   Found 5 outliers among 100 measurements (5.00%)
     3 (3.00%) high mild
     2 (2.00%) high severe
   
   Benchmarking filter run array high selectivity (kept 1023/1024): Collecting 
100 samples in estimated 5.5900 s (15k iterations
   filter run array high selectivity (kept 1023/1024)
                           time:   [359.79 µs 361.40 µs 363.47 µs]
                           change: [-7.7904% -5.5816% -3.1503%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 14 outliers among 100 measurements (14.00%)
     3 (3.00%) high mild
     11 (11.00%) high severe
   
   filter run array low selectivity (kept 1/1024)
                           time:   [209.87 µs 210.45 µs 211.09 µs]
                           change: [-13.950% -13.255% -12.616%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     3 (3.00%) high mild
     3 (3.00%) high severe
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Undo run end filter performance regression [arrow-rs]

Reply via email to