alamb commented on code in PR #6178:
URL: https://github.com/apache/arrow-rs/pull/6178#discussion_r1705580022
##########
arrow-buffer/src/buffer/mutable.rs:
##########
@@ -668,6 +668,56 @@ impl MutableBuffer {
}
}
+impl<'a> MutableBuffer {
+ /// Creates a [`MutableBuffer`] from an `&[u8]` [`Iterator`] with a
trusted (upper) length.
+ /// Prefer this to `collect` whenever possible, as it is faster.
+ /// # Panic
+ /// This method will panic if `&u8.len()` != `item_size`.
+ /// # Example
+ /// ```
+ /// # use arrow_buffer::MutableBuffer;
+ /// # use arrow_buffer::ToByteSlice;
+ /// let iter = vec![[1_u8, 2].to_byte_slice(), [3_u8,
4].to_byte_slice()].into_iter();
+ /// let buf = unsafe {
+ /// MutableBuffer::from_trusted_len_iter_slice_u8(iter, 2)
+ /// };
+ /// assert_eq!(4, buf.len());
+ /// ```
+ /// # Safety
+ /// This method assumes that the iterator's size is correct and is
undefined behavior
Review Comment:
I think it also assumes that each slice is `item_size` bytes
##########
arrow-select/src/filter.rs:
##########
@@ -707,6 +710,62 @@ fn filter_byte_view<T: ByteViewType>(
GenericByteViewArray::from(unsafe { builder.build_unchecked() })
}
+fn filter_fixed_size_binary(
+ array: &FixedSizeBinaryArray,
+ predicate: &FilterPredicate,
+) -> FixedSizeBinaryArray {
+ let values: &[u8] = array.values();
+ let value_length = array.value_length() as usize;
+ let calcualte_offset_from_index = |index: usize| index * value_length;
Review Comment:
There is a typo here:
```suggestion
let calculate_offset_from_index = |index: usize| index * value_length;
```
##########
arrow-buffer/src/buffer/mutable.rs:
##########
@@ -896,11 +946,19 @@ mod tests {
#[test]
fn test_from_trusted_len_iter() {
let iter = vec![1u32, 2].into_iter();
- let buf = unsafe { Buffer::from_trusted_len_iter(iter) };
+ let buf = unsafe { MutableBuffer::from_trusted_len_iter(iter) };
Review Comment:
why this change?
##########
arrow-select/src/filter.rs:
##########
@@ -985,6 +1044,25 @@ mod tests {
_test_filter_byte_view::<BinaryViewType>()
}
+ #[test]
+ fn test_filter_fixed_binary() {
+ let v1 = [1_u8, 2];
+ let v2 = [3_u8, 4];
+ let v3 = [5_u8, 6];
+ let v = vec![&v1, &v2, &v3];
+ let a = FixedSizeBinaryArray::from(v);
+ let b = BooleanArray::from(vec![true, false, true]);
Review Comment:
Given there are different code paths for the different types of filters (aka
different patterns in `BooleanArray`) Perhaps we can add some coverage to
ensure the other filter patterns are hit.
##########
arrow-select/src/filter.rs:
##########
@@ -707,6 +710,62 @@ fn filter_byte_view<T: ByteViewType>(
GenericByteViewArray::from(unsafe { builder.build_unchecked() })
}
+fn filter_fixed_size_binary(
+ array: &FixedSizeBinaryArray,
+ predicate: &FilterPredicate,
+) -> FixedSizeBinaryArray {
+ let values: &[u8] = array.values();
+ let value_length = array.value_length() as usize;
+ let calcualte_offset_from_index = |index: usize| index * value_length;
+ let buffer = match &predicate.strategy {
+ IterationStrategy::SlicesIterator => {
+ let mut buffer = MutableBuffer::with_capacity(predicate.count *
value_length);
+ for (start, end) in SlicesIterator::new(&predicate.filter) {
+ buffer.extend_from_slice(
+
&values[calcualte_offset_from_index(start)..calcualte_offset_from_index(end)],
+ );
+ }
+ buffer
+ }
+ IterationStrategy::Slices(slices) => {
+ let mut buffer = MutableBuffer::with_capacity(predicate.count *
value_length);
+ for (start, end) in slices {
+ buffer.extend_from_slice(
+
&values[calcualte_offset_from_index(*start)..calcualte_offset_from_index(*end)],
+ );
+ }
+ buffer
+ }
+ IterationStrategy::IndexIterator => {
+ let iter = IndexIterator::new(&predicate.filter,
predicate.count).map(|x| {
+
&values[calcualte_offset_from_index(x)..calcualte_offset_from_index(x + 1)]
+ });
+
+ // SAFETY: IndexIterator is trusted length
+ unsafe { MutableBuffer::from_trusted_len_iter_slice_u8(iter,
value_length) }
Review Comment:
I wonder how much better performance does this get than calculating
`MutableBuffer::extend_from_slice`? I looked at the implementation of
extend_from_slice and it does very much the same thing
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]