samuelcolvin commented on PR #6131:
URL: https://github.com/apache/arrow-rs/pull/6131#issuecomment-2254536638

   Running benches with filter `ilike.*contains` on GCP:
   
   ### haystack length 2
   
   ```
   group                          icontains-performance                  master
   -----                          ---------------------                  ------
   ilike_utf8 scalar contains     1.00    381.9±0.83µs        ? ?/sec    3.97   
1515.2±7.72µs        ? ?/sec
   nilike_utf8 scalar contains    1.00    382.2±0.92µs        ? ?/sec    3.96   
1514.7±6.07µs        ? ?/sec
   ```
   
   ### haystack length 0..400
   
   ```
   group                          icontains-performance                  master
   -----                          ---------------------                  ------
   ilike_utf8 scalar contains     1.00     22.2±0.08ms        ? ?/sec    1.02   
  22.7±0.10ms        ? ?/sec
   nilike_utf8 scalar contains    1.00     22.2±0.09ms        ? ?/sec    1.02   
  22.7±0.12ms        ? ?/sec
   ```
   
   E.g. this is very good when the haytack length is short (or I guess 
equivalently when there's a match early in the haystack) and no worse for 
longer haystacks.
   
   Hopefully long term @BurntSushi implements an ascii case-insensitive finder 
in `memchr` as he mentioned in 
https://github.com/BurntSushi/memchr/pull/156#issuecomment-2246352599 and we 
can replace this with something that's faster in all cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to