tustvold commented on a change in pull request #1248:
URL: https://github.com/apache/arrow-rs/pull/1248#discussion_r799817101



##########
File path: arrow/src/compute/kernels/filter.rs
##########
@@ -119,17 +155,83 @@ impl<'a> Iterator for SlicesIterator<'a> {
     }
 }
 
+/// An iterator of `usize` whose index in [`BooleanArray`] is true
+///
+/// This provides the best performance on all but the least selective 
predicates (which keep most
+/// / all rows), where the benefits of copying large runs instead favours 
[`SlicesIterator`]
+struct IndexIterator<'a> {
+    current_chunk: u64,
+    chunk_end_offset: usize,
+    remaining: usize,
+    iter: UnalignedBitChunkIterator<'a>,
+}
+
+impl<'a> IndexIterator<'a> {
+    fn new(filter: &'a BooleanArray, len: usize) -> Self {
+        assert_eq!(filter.null_count(), 0);
+        let data = filter.data();
+        let chunks =
+            UnalignedBitChunk::new(&data.buffers()[0], data.offset(), 
data.len());
+        let mut iter = chunks.iter();
+
+        let current_chunk = iter.next().unwrap_or(0);
+        let chunk_end_offset = 64 - chunks.lead_padding();
+
+        Self {
+            current_chunk,
+            chunk_end_offset,
+            remaining: len,
+            iter,
+        }
+    }
+}
+
+impl<'a> Iterator for IndexIterator<'a> {
+    type Item = usize;
+
+    fn next(&mut self) -> Option<Self::Item> {
+        while self.remaining != 0 {
+            if self.current_chunk != 0 {
+                let bit_pos = self.current_chunk.trailing_zeros();
+                self.current_chunk ^= 1 << bit_pos;
+                self.remaining -= 1;
+                return Some(self.chunk_end_offset + (bit_pos as usize) - 64);

Review comment:
       Yeah - this is just to avoid using signed indexes. I've benchmarked the 
change and it makes no difference to performance, but I would rather keep using 
`usize` for consistency. I'll add some clarifying comments




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to