alamb commented on code in PR #9986:
URL: https://github.com/apache/arrow-rs/pull/9986#discussion_r3276084881
##########
arrow-select/src/filter.rs:
##########
@@ -824,17 +825,10 @@ where
IterationStrategy::All | IterationStrategy::None => unreachable!(),
}
- let mut builder = ArrayDataBuilder::new(T::DATA_TYPE)
- .len(predicate.count)
- .add_buffer(filter.dst_offsets.into())
- .add_buffer(filter.dst_values.into());
+ let offsets = unsafe {
OffsetBuffer::new_unchecked(filter.dst_offsets.into()) };
Review Comment:
Here it would also be nice to comment about why this is safe (what
assumptions it relies on). However, I see the existing code doesn't have a
safety comment
```rust
// Safety: offsets are correctly constructed
```
##########
arrow-select/src/filter.rs:
##########
@@ -579,6 +580,14 @@ fn filter_null_mask(
Some((null_count, nulls))
}
+/// Filters `nulls` and reuses the computed `null_count` to avoid scanning the
bitmap.
+fn filter_nulls(nulls: Option<&NullBuffer>, predicate: &FilterPredicate) ->
Option<NullBuffer> {
+ let (null_count, nulls) = filter_null_mask(nulls, predicate)?;
+ let buffer = BooleanBuffer::new(nulls, 0, predicate.count);
+
+ Some(unsafe { NullBuffer::new_unchecked(buffer, null_count) })
Review Comment:
can we please add a safety comment here explaining why this is safe:
```suggestion
// Safety: null_count return from filter_null_mas is correct
Some(unsafe { NullBuffer::new_unchecked(buffer, null_count) })
```
It might also be nice to add a debug assert here to verify in debug builds
```rust
debug_assert_eq!(null_count, nulls.num_zeros())
```
##########
arrow-select/src/filter.rs:
##########
@@ -843,17 +837,11 @@ fn filter_byte_view<T: ByteViewType>(
predicate: &FilterPredicate,
) -> GenericByteViewArray<T> {
let new_view_buffer = filter_native(array.views(), predicate);
+ let views = ScalarBuffer::new(new_view_buffer, 0, predicate.count);
Review Comment:
Here (and other places) you can probably use the unchecked variants too to
skip some checks, if we need to get additional speed
(`ScalarBuffer::new_unchecked`)
However, given your PR removes an allocation (the buffers array) I suspect
this is already going to be faster and avoiding unsafe is a nice bonus ❤️
##########
arrow-select/src/filter.rs:
##########
@@ -579,6 +580,14 @@ fn filter_null_mask(
Some((null_count, nulls))
}
+/// Filters `nulls` and reuses the computed `null_count` to avoid scanning the
bitmap.
+fn filter_nulls(nulls: Option<&NullBuffer>, predicate: &FilterPredicate) ->
Option<NullBuffer> {
Review Comment:
What do you think about making this a method on `FilterPredicate`? That
would make it easier to find / reuse I think.
```rust
impl FilterPredicate {
fn filter_nulls(&self, nulls: Option<&NullBuffer>) -> Option<NullBuffer> {
...
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]