samuelcolvin opened a new pull request, #6128:
URL: https://github.com/apache/arrow-rs/pull/6128

   # Which issue does this PR close?
   
   This is will have some trivial conflicts after #6118, it's part of #6107
   
   # Rationale for this change
    
   Lots of context in #6107, this makes `LIKE` queries which are simply 
"conatins" significantly faster.
   
   Running
   
   ```bash
    cargo bench -p arrow --bench comparison_kernels -F test_utils -- 
'like.*contains'
   ```
   
   Gives a max of ~50% speedup, but the default haystack size is 4 characters, 
if you increase that to 400, the performance improvement is quite marked!
   
   ```
   ➤ cargo bench -p arrow --bench comparison_kernels -F test_utils -- 
'like.*contains'
      Compiling arrow-string v52.2.0 (/Users/samuel/code/arrow-rs/arrow-string)
      Compiling arrow v52.2.0 (/Users/samuel/code/arrow-rs/arrow)
       Finished `bench` profile [optimized] target(s) in 4.09s
        Running benches/comparison_kernels.rs 
(target/release/deps/comparison_kernels-95ab196215ed59e6)
   like_utf8 scalar contains
                           time:   [888.14 µs 889.44 µs 891.07 µs]
                           change: [-80.213% -80.152% -80.091%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 14 outliers among 100 measurements (14.00%)
     5 (5.00%) high mild
     9 (9.00%) high severe
   
   Benchmarking like_utf8view scalar contains: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 8.9s, or reduce sample count to 50.
   like_utf8view scalar contains
                           time:   [89.263 ms 89.328 ms 89.402 ms]
                           change: [-50.303% -50.217% -50.124%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
     4 (4.00%) high mild
     3 (3.00%) high severe
   
   nlike_utf8 scalar contains
                           time:   [890.63 µs 895.93 µs 902.48 µs]
                           change: [-80.369% -80.298% -80.223%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     2 (2.00%) high mild
     6 (6.00%) high severe
   
   ilike_utf8 scalar contains
                           time:   [31.714 ms 31.793 ms 31.876 ms]
                           change: [+1.9043% +2.1414% +2.4089%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   nilike_utf8 scalar contains
                           time:   [31.459 ms 31.508 ms 31.560 ms]
                           change: [+0.8653% +1.0474% +1.2321%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   ```
   
   # What changes are included in this PR?
   
   Use [`memchr`](https://docs.rs/memchr/latest/memchr/index.html) (which was 
already a dependency) instead of `str.contains`.
   
   # Are there any user-facing changes?
   
   AFAIK, no.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to