findepi commented on code in PR #6662:
URL: https://github.com/apache/arrow-rs/pull/6662#discussion_r1830014557


##########
arrow-string/src/predicate.rs:
##########
@@ -116,10 +116,17 @@ impl<'a> Predicate<'a> {
             }),
             Predicate::Contains(finder) => {
                 if let Some(string_view_array) = 
array.as_any().downcast_ref::<StringViewArray>() {
+                    let nulls = string_view_array.logical_nulls();
                     BooleanArray::from(
                         string_view_array
                             .bytes_iter()
-                            .map(|haystack| finder.find(haystack).is_some() != 
negate)
+                            .enumerate()
+                            .map(|(idx, haystack)| {
+                                if nulls.as_ref().map(|n| 
n.is_null(idx)).unwrap_or_default() {

Review Comment:
   I did run benchmarks locally too. Thanks @alamb for the instructions.
   
   First, on the commit 16213501769371c7d2d6ce299755337b6db4d8fd where the PR 
branch starts
   
   ```
   git checkout
   16213501769371c7d2d6ce299755337b6db4d8fd
   caffeinate -ds cargo bench --bench comparison_kernels -- --save-baseline 
main like
   ```
   
   then on 1e82884f3d12f55260e007b0cb42795635d3b470, the tip of the PR branch
   ```
   git checkout findepi/fix-string-view-like-checks-with-null-values-1121b8
   caffeinate -ds cargo bench --bench comparison_kernels -- --baseline main like
   ```
   
   This command runs bechmark and produces comparison. Let me omit low changes 
(below 9%) for brevity, as well as outliers reporting.
   
   significant performance regressions:
   ```
   like_utf8view scalar ends with 4 bytes
                           time:   [36.885 ms 36.925 ms 36.972 ms]
                           change: [+77.568% +77.857% +78.161%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   like_utf8view scalar ends with 6 bytes
                           time:   [36.761 ms 36.818 ms 36.881 ms]
                           change: [+77.369% +77.776% +78.195%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   like_utf8view scalar ends with 13 bytes
                           time:   [36.537 ms 36.613 ms 36.694 ms]
                           change: [+78.308% +79.019% +79.752%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   like_utf8view scalar starts with 4 bytes
                           time:   [25.490 ms 25.542 ms 25.596 ms]
                           change: [+112.76% +113.25% +113.76%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   like_utf8view scalar starts with 6 bytes
                           time:   [36.556 ms 36.598 ms 36.645 ms]
                           change: [+84.943% +85.578% +86.205%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   like_utf8view scalar starts with 13 bytes
                           time:   [37.065 ms 37.105 ms 37.152 ms]
                           change: [+86.735% +87.424% +88.116%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   ```
   
   significant performance improvements:
   ```
   ilike_utf8 scalar equals
                           time:   [475.56 µs 476.17 µs 476.93 µs]
                           change: [-47.498% -47.406% -47.310%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   ilike_utf8 scalar contains
                           time:   [2.4461 ms 2.4500 ms 2.4552 ms]
                           change: [-15.221% -15.008% -14.786%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   ilike_utf8 scalar ends with
                           time:   [546.82 µs 547.76 µs 548.86 µs]
                           change: [-43.736% -43.605% -43.466%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   ilike_utf8 scalar starts with
                           time:   [560.40 µs 561.46 µs 562.67 µs]
                           change: [-43.349% -42.997% -42.641%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   Benchmarking ilike_utf8 scalar complex: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 5.4s, enable flat sampling, or reduce sample count to 60.
   ilike_utf8 scalar complex
                           time:   [1.0629 ms 1.0637 ms 1.0649 ms]
                           change: [-29.556% -29.299% -29.062%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   nilike_utf8 scalar equals
                           time:   [475.34 µs 476.23 µs 477.67 µs]
                           change: [-46.418% -46.259% -46.081%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   nilike_utf8 scalar contains
                           time:   [2.4427 ms 2.4449 ms 2.4477 ms]
                           change: [-15.899% -15.421% -15.120%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   nilike_utf8 scalar ends with
                           time:   [548.31 µs 548.96 µs 549.63 µs]
                           change: [-42.924% -42.742% -42.554%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   nilike_utf8 scalar starts with
                           time:   [557.88 µs 558.54 µs 559.35 µs]
                           change: [-42.683% -42.392% -42.144%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   Benchmarking nilike_utf8 scalar complex: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 5.4s, enable flat sampling, or reduce sample count to 60.
   nilike_utf8 scalar complex
                           time:   [1.0648 ms 1.0669 ms 1.0696 ms]
                           change: [-28.834% -28.600% -28.353%] (p = 0.00 < 
0.05)
                           Performance has improved.
   ```
   
   
   TL;DR 
   i must be doing something wrong here. the benchmarks report improvement for 
code path i didn't change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to