alamb commented on code in PR #9986:
URL: https://github.com/apache/arrow-rs/pull/9986#discussion_r3276084881


##########
arrow-select/src/filter.rs:
##########
@@ -824,17 +825,10 @@ where
         IterationStrategy::All | IterationStrategy::None => unreachable!(),
     }
 
-    let mut builder = ArrayDataBuilder::new(T::DATA_TYPE)
-        .len(predicate.count)
-        .add_buffer(filter.dst_offsets.into())
-        .add_buffer(filter.dst_values.into());
+    let offsets = unsafe { 
OffsetBuffer::new_unchecked(filter.dst_offsets.into()) };

Review Comment:
   Here it would also be nice to comment about why this is safe (what 
assumptions it relies on). However, I see the existing code doesn't have a 
safety comment 
   
   ```rust
   // Safety: offsets are correctly constructed
   ```



##########
arrow-select/src/filter.rs:
##########
@@ -579,6 +580,14 @@ fn filter_null_mask(
     Some((null_count, nulls))
 }
 
+/// Filters `nulls` and reuses the computed `null_count` to avoid scanning the 
bitmap.
+fn filter_nulls(nulls: Option<&NullBuffer>, predicate: &FilterPredicate) -> 
Option<NullBuffer> {
+    let (null_count, nulls) = filter_null_mask(nulls, predicate)?;
+    let buffer = BooleanBuffer::new(nulls, 0, predicate.count);
+
+    Some(unsafe { NullBuffer::new_unchecked(buffer, null_count) })

Review Comment:
   can we please add a safety comment here explaining why this is safe:
   
   
   ```suggestion
       // Safety: null_count return from filter_null_mas is correct
       Some(unsafe { NullBuffer::new_unchecked(buffer, null_count) })
   ```
   
   It might also be nice to add a debug assert here to verify in debug builds
   
   ```rust
   debug_assert_eq!(null_count, nulls.num_zeros())
   ```



##########
arrow-select/src/filter.rs:
##########
@@ -843,17 +837,11 @@ fn filter_byte_view<T: ByteViewType>(
     predicate: &FilterPredicate,
 ) -> GenericByteViewArray<T> {
     let new_view_buffer = filter_native(array.views(), predicate);
+    let views = ScalarBuffer::new(new_view_buffer, 0, predicate.count);

Review Comment:
   Here (and other places) you can probably use the unchecked variants too to 
skip some checks, if we need to get additional speed 
(`ScalarBuffer::new_unchecked`)
   
   However, given your PR removes an allocation (the buffers array) I suspect 
this is already going to be faster and avoiding unsafe is a nice bonus ❤️ 



##########
arrow-select/src/filter.rs:
##########
@@ -579,6 +580,14 @@ fn filter_null_mask(
     Some((null_count, nulls))
 }
 
+/// Filters `nulls` and reuses the computed `null_count` to avoid scanning the 
bitmap.
+fn filter_nulls(nulls: Option<&NullBuffer>, predicate: &FilterPredicate) -> 
Option<NullBuffer> {

Review Comment:
   What do you think about making this a method on `FilterPredicate`? That 
would make it easier to find / reuse I think. 
   
   ```rust
   impl FilterPredicate { 
     fn filter_nulls(&self, nulls:  Option<&NullBuffer>) -> Option<NullBuffer> {
        ...
     }
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to