alamb commented on code in PR #9338:
URL: https://github.com/apache/arrow-rs/pull/9338#discussion_r2766183359


##########
arrow-string/src/regexp.rs:
##########
@@ -200,20 +199,13 @@ where
         }
     }
 
-    let buffer = result.into();
-    let data = unsafe {
-        ArrayData::new_unchecked(
-            DataType::Boolean,
-            array.len(),
-            None,
-            null_bit_buffer,
-            0,
-            vec![buffer],
-            vec![],
-        )
-    };
-
-    Ok(BooleanArray::from(data))
+    let values = BooleanBuffer::from(result);
+    let nulls = array
+        .nulls()
+        .map(|n| n.inner().sliced())
+        .map(|b| NullBuffer::new(BooleanBuffer::new(b, 0, array.len())))
+        .filter(|n| n.null_count() > 0);

Review Comment:
   This pattern of creating a BooleanBuffer from with a length and then 
filtering for nulls count is quite common (and becoming more so)
   
   I wonder if we should (as a follow on PR) make a function to do it, maybe 
like 
   
   ```rust
   let Option<NullBuffer> = NullBuffer::try_from_unsliced(b, array.len())
   ```
   Then this code would look like 
   ```rust
   let nulls = array
           .nulls()
           .map(|n| n.inner().sliced())
           .map(|b| NullBuffer::try_from_unsliced(b, array.len()
   ```
   
   🤔 



##########
arrow-array/src/array/boolean_array.rs:
##########
@@ -520,8 +520,10 @@ impl BooleanArray {
         let data = val_builder.as_slice_mut();
 
         let null_slice = null_builder.as_slice_mut();
+        let mut non_nulls = 0usize;

Review Comment:
   This is very performance critical code. I'll run some benchmarks to make 
sure this doesn't mess something up by accident



##########
arrow-array/src/array/boolean_array.rs:
##########
@@ -520,8 +520,10 @@ impl BooleanArray {
         let data = val_builder.as_slice_mut();
 
         let null_slice = null_builder.as_slice_mut();
+        let mut non_nulls = 0usize;

Review Comment:
   Also it seems like this code previously didn't bother to check the null 
counts, so we could always leave the existing behavior as is 🤔 



##########
arrow-string/src/regexp.rs:
##########
@@ -180,7 +180,6 @@ pub fn regexp_is_match_scalar<'a, S>(
 where
     &'a S: StringArrayType<'a>,
 {
-    let null_bit_buffer = array.nulls().map(|x| x.inner().sliced());

Review Comment:
   I think it would be safest to leave the `sliced()` call in (avoid potential 
regressions) and make a follow on PR to try and remove it separately



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to