findepi commented on code in PR #6662:
URL: https://github.com/apache/arrow-rs/pull/6662#discussion_r1830088891


##########
arrow-string/src/predicate.rs:
##########
@@ -116,10 +116,17 @@ impl<'a> Predicate<'a> {
             }),
             Predicate::Contains(finder) => {
                 if let Some(string_view_array) = 
array.as_any().downcast_ref::<StringViewArray>() {
+                    let nulls = string_view_array.logical_nulls();
                     BooleanArray::from(
                         string_view_array
                             .bytes_iter()
-                            .map(|haystack| finder.find(haystack).is_some() != 
negate)
+                            .enumerate()
+                            .map(|(idx, haystack)| {
+                                if nulls.as_ref().map(|n| 
n.is_null(idx)).unwrap_or_default() {

Review Comment:
   pushed performance-oriented changes
   i still observe a regression for one case
   
   ```
   like_utf8view scalar starts with 4 bytes
                           time:   [12.899 ms 12.939 ms 12.986 ms]
                           change: [+7.7003% +8.0298% +8.4320%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   ```
   
   and still see improvements for many ilike cases **e.g.**
   ```
   nilike_utf8 scalar ends with
                           time:   [547.64 µs 548.55 µs 549.63 µs]
                           change: [-42.656% -42.257% -41.801%] (p = 0.00 < 
0.05)
                           Performance has improved.
   ```
   
   
   i don't think i know how to move it further from here. @alamb @tustvold  ptal



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to