Jefffrey commented on code in PR #21453:
URL: https://github.com/apache/datafusion/pull/21453#discussion_r3104877889


##########
datafusion/functions-aggregate/src/approx_distinct.rs:
##########
@@ -269,6 +303,67 @@ impl Debug for ApproxDistinct {
     }
 }
 
+/// Optimized accumulator for Boolean type - only 2 possible distinct values.
+#[derive(Debug)]
+struct BoolDistinctAccumulator {
+    seen_true: bool,
+    seen_false: bool,
+}
+
+impl BoolDistinctAccumulator {
+    fn new() -> Self {
+        Self {
+            seen_true: false,
+            seen_false: false,
+        }
+    }
+}
+
+impl Accumulator for BoolDistinctAccumulator {
+    fn update_batch(&mut self, values: &[ArrayRef]) -> Result<()> {
+        if values.is_empty() {
+            return Ok(());
+        }
+
+        let array: &BooleanArray = downcast_value!(values[0], BooleanArray);
+        for value in array.iter().flatten() {
+            if value {
+                self.seen_true = true;
+            } else {
+                self.seen_false = true;

Review Comment:
   If we wanna micro optimize even more, we can skip this iteration entirely if 
`seen_true` and `seen_false` are already set here



##########
datafusion/functions-aggregate/src/approx_distinct.rs:
##########
@@ -383,6 +510,7 @@ impl AggregateUDFImpl for ApproxDistinct {
             DataType::Utf8View => Box::new(StringViewHLLAccumulator::new()),
             DataType::Binary => Box::new(BinaryHLLAccumulator::<i32>::new()),
             DataType::LargeBinary => 
Box::new(BinaryHLLAccumulator::<i64>::new()),
+            DataType::Boolean => Box::new(BoolDistinctAccumulator::new()),

Review Comment:
   This seems newly added; do we have tests for it? Or we could split it out 
into a separate PR as it is not strictly performance related



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to