alamb commented on code in PR #2957:
URL: https://github.com/apache/arrow-rs/pull/2957#discussion_r1008026886


##########
arrow-array/src/array/boolean_array.rs:
##########
@@ -103,6 +103,53 @@ impl BooleanArray {
         &self.data.buffers()[0]
     }
 
+    /// Returns the number of true values within this buffer

Review Comment:
   ```suggestion
       /// Returns the number of non null, true values within this array
   ```



##########
arrow-array/src/array/boolean_array.rs:
##########
@@ -103,6 +103,53 @@ impl BooleanArray {
         &self.data.buffers()[0]
     }
 
+    /// Returns the number of true values within this buffer
+    pub fn true_count(&self) -> usize {
+        match self.data.null_buffer() {
+            Some(nulls) => {
+                let null_chunks = nulls.bit_chunks(self.offset(), self.len());
+                let value_chunks = self.values().bit_chunks(self.offset(), 
self.len());
+                null_chunks
+                    .iter()
+                    .zip(value_chunks.iter())
+                    .chain(std::iter::once((
+                        null_chunks.remainder_bits(),
+                        value_chunks.remainder_bits(),
+                    )))
+                    .map(|(a, b)| (a & b).count_ones() as usize)
+                    .sum()
+            }
+            None => self
+                .values()
+                .count_set_bits_offset(self.offset(), self.len()),
+        }
+    }
+
+    /// Returns the number of false values within this buffer
+    pub fn false_count(&self) -> usize {
+        match self.data.null_buffer() {

Review Comment:
   maybe this could be simplified into `self.size() - self.null_count() - 
self.true_count()` ? I think that would be basically as fast?



##########
arrow-array/src/array/boolean_array.rs:
##########
@@ -431,4 +479,29 @@ mod tests {
     fn test_from_array_data_validation() {
         let _ = BooleanArray::from(ArrayData::new_empty(&DataType::Int32));
     }
+
+    #[test]
+    fn test_true_false_count() {
+        let mut rng = thread_rng();
+
+        for _ in 0..10 {
+            let d: Vec<_> = (0..2000).map(|_| rng.gen_bool(0.5)).collect();

Review Comment:
   ```suggestion
               // no nulls
               let d: Vec<_> = (0..2000).map(|_| rng.gen_bool(0.5)).collect();
   ```



##########
arrow-array/src/array/boolean_array.rs:
##########
@@ -431,4 +479,29 @@ mod tests {
     fn test_from_array_data_validation() {
         let _ = BooleanArray::from(ArrayData::new_empty(&DataType::Int32));
     }
+
+    #[test]
+    fn test_true_false_count() {
+        let mut rng = thread_rng();
+
+        for _ in 0..10 {
+            let d: Vec<_> = (0..2000).map(|_| rng.gen_bool(0.5)).collect();
+            let b = BooleanArray::from(d.clone());
+
+            let expected_true = d.iter().filter(|x| **x).count();
+            assert_eq!(b.true_count(), expected_true);
+            assert_eq!(b.false_count(), d.len() - expected_true);
+
+            let d: Vec<_> = (0..2000)

Review Comment:
   ```suggestion
               // with nulls
               let d: Vec<_> = (0..2000)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to