alamb commented on a change in pull request #8303:
URL: https://github.com/apache/arrow/pull/8303#discussion_r496930511



##########
File path: rust/arrow/src/compute/kernels/filter.rs
##########
@@ -353,15 +353,19 @@ impl FilterContext {
                         // foreach bit in batch:
                         if (filter_batch & self.filter_mask[j]) != 0 {
                             let data_index = (i * 64) + j;
-                            values.push(input_array.value(data_index));
+                            if input_array.is_null(data_index) {

Review comment:
       This is the same pattern as in the handling for primative array: 
https://github.com/apache/arrow/pull/8303/files#diff-d7b0b7cde1850e8744ceda458c6dea81R294-L298

##########
File path: rust/arrow/src/compute/kernels/filter.rs
##########
@@ -373,7 +377,11 @@ impl FilterContext {
                         // foreach bit in batch:
                         if (filter_batch & self.filter_mask[j]) != 0 {
                             let data_index = (i * 64) + j;
-                            values.push(input_array.value(data_index));
+                            if input_array.is_null(data_index) {

Review comment:
       Likewise, this special case appears to miss the null check too

##########
File path: rust/arrow/src/compute/kernels/filter.rs
##########
@@ -353,15 +353,19 @@ impl FilterContext {
                         // foreach bit in batch:
                         if (filter_batch & self.filter_mask[j]) != 0 {
                             let data_index = (i * 64) + j;
-                            values.push(input_array.value(data_index));
+                            if input_array.is_null(data_index) {
+                                values.push(None)
+                            } else {
+                                
values.push(Some(input_array.value(data_index)))
+                            }
                         }
                     }
                 }
                 Ok(Arc::new(BinaryArray::from(values)))
             }
             DataType::Utf8 => {
                 let input_array = 
array.as_any().downcast_ref::<StringArray>().unwrap();
-                let mut values: Vec<&str> = 
Vec::with_capacity(self.filtered_count);
+                let mut values: Vec<Option<&str>> = 
Vec::with_capacity(self.filtered_count);

Review comment:
       Note using an `Option` is likely to increase the temporary storage 
requirements a bit.
   
   It would likely be possible to avoid this allocation entirely if we used the 
lower level  `ArrayBuilder::with_bit_buffer`. 
   
   I chose to follow the style of the rest of this module, though I would love 
opinions on trying to perf check this / optimize it (maybe a follow on JIRA 
ticket is enough)?
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to