Re: [PR] Implement specialized filter kernel for `FixedSizeByteArray` [arrow-rs]

via GitHub Thu, 08 Aug 2024 07:34:35 -0700


chloro-pn commented on code in PR #6178:
URL: https://github.com/apache/arrow-rs/pull/6178#discussion_r1709648179



##########
arrow-select/src/filter.rs:
##########
@@ -707,6 +710,62 @@ fn filter_byte_view<T: ByteViewType>(
     GenericByteViewArray::from(unsafe { builder.build_unchecked() })
 }
 
+fn filter_fixed_size_binary(
+    array: &FixedSizeBinaryArray,
+    predicate: &FilterPredicate,
+) -> FixedSizeBinaryArray {
+    let values: &[u8] = array.values();
+    let value_length = array.value_length() as usize;
+    let calcualte_offset_from_index = |index: usize| index * value_length;
+    let buffer = match &predicate.strategy {
+        IterationStrategy::SlicesIterator => {
+            let mut buffer = MutableBuffer::with_capacity(predicate.count * 
value_length);
+            for (start, end) in SlicesIterator::new(&predicate.filter) {
+                buffer.extend_from_slice(
+                    
&values[calcualte_offset_from_index(start)..calcualte_offset_from_index(end)],
+                );
+            }
+            buffer
+        }
+        IterationStrategy::Slices(slices) => {
+            let mut buffer = MutableBuffer::with_capacity(predicate.count * 
value_length);
+            for (start, end) in slices {
+                buffer.extend_from_slice(
+                    
&values[calcualte_offset_from_index(*start)..calcualte_offset_from_index(*end)],
+                );
+            }
+            buffer
+        }
+        IterationStrategy::IndexIterator => {
+            let iter = IndexIterator::new(&predicate.filter, 
predicate.count).map(|x| {
+                
&values[calcualte_offset_from_index(x)..calcualte_offset_from_index(x + 1)]
+            });
+
+            // SAFETY: IndexIterator is trusted length
+            unsafe { MutableBuffer::from_trusted_len_iter_slice_u8(iter, 
value_length) }

Review Comment:
   I will use these two methods to implement and compare performance, tomorrow 
or the day after tomorrow.
   
   Is this considered premature optimization? For me, copying as many bytes as 
possible at once is always better, without considering the complexity 
introduced.
   For me, this is "best practice".



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Implement specialized filter kernel for `FixedSizeByteArray` [arrow-rs]

Reply via email to