Re: [PR] Make `push_batch_with_filter` up to 3x faster for primitive types [arrow-rs]

via GitHub Mon, 15 Dec 2025 00:03:41 -0800


Dandandan commented on code in PR #8951:
URL: https://github.com/apache/arrow-rs/pull/8951#discussion_r2618385313



##########
arrow-buffer/src/builder/null.rs:
##########
@@ -193,6 +193,85 @@ impl NullBufferBuilder {
         }
     }
 
+    /// Extends this builder with validity values.
+    ///
+    ///
+    /// # Example
+    /// ```
+    /// # use arrow_buffer::NullBufferBuilder;
+    /// let mut builder = NullBufferBuilder::new(8);
+    /// let validities = [true, false, true, true];
+    /// builder.extend(validities.iter().copied());
+    /// assert_eq!(builder.len(), 4);
+    /// ```
+    pub fn extend<I: Iterator<Item = bool>>(&mut self, iter: I) {
+        let (lower, upper) = iter.size_hint();
+        let len = upper.expect("Iterator must have exact size_hint");
+        debug_assert_eq!(lower, len, "Iterator must have exact size_hint");
+
+        if len == 0 {
+            return;
+        }
+
+        // Materialize since we're about to append bits
+        self.materialize_if_needed();
+
+        let buf = self.bitmap_builder.as_mut().unwrap();
+        let start_len = buf.len();
+        // Advance to allocate space, initializing new bits to 0
+        buf.advance(len);
+
+        let slice = buf.as_slice_mut();
+        let mut bit_idx = start_len;
+        let end_bit = start_len + len;
+
+        // Process in chunks of 64 bits when byte-aligned for better 
performance

Review Comment:
   I just checked - this seems an additional ~30% improvement for null handling:
   
   ```
   filter: primitive, 8192, nulls: 0.1, selectivity: 0.1
                           time:   [2.4060 ms 2.4096 ms 2.4133 ms]
                           change: [−33.920% −32.902% −32.274%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   filter: primitive, 8192, nulls: 0.1, selectivity: 0.8
                           time:   [2.1610 ms 2.1666 ms 2.1728 ms]
                           change: [−29.488% −28.499% −27.767%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
     4 (4.00%) high mild
     3 (3.00%) high severe
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Make `push_batch_with_filter` up to 3x faster for primitive types [arrow-rs]

Reply via email to