samuelcolvin commented on PR #6131: URL: https://github.com/apache/arrow-rs/pull/6131#issuecomment-2254536638
Running benches with filter `ilike.*contains` on GCP: ### haystack length 2 ``` group icontains-performance master ----- --------------------- ------ ilike_utf8 scalar contains 1.00 381.9±0.83µs ? ?/sec 3.97 1515.2±7.72µs ? ?/sec nilike_utf8 scalar contains 1.00 382.2±0.92µs ? ?/sec 3.96 1514.7±6.07µs ? ?/sec ``` ### haystack length 0..400 ``` group icontains-performance master ----- --------------------- ------ ilike_utf8 scalar contains 1.00 22.2±0.08ms ? ?/sec 1.02 22.7±0.10ms ? ?/sec nilike_utf8 scalar contains 1.00 22.2±0.09ms ? ?/sec 1.02 22.7±0.12ms ? ?/sec ``` E.g. this is very good when the haytack length is short (or I guess equivalently when there's a match early in the haystack) and no worse for longer haystacks. Hopefully long term @BurntSushi implements an ascii case-insensitive finder in `memchr` as he mentioned in https://github.com/BurntSushi/memchr/pull/156#issuecomment-2246352599 and we can replace this with something that's faster in all cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
