realno commented on a change in pull request #1547:
URL: https://github.com/apache/arrow-datafusion/pull/1547#discussion_r782594743



##########
File path: datafusion/src/physical_plan/expressions/variance.rs
##########
@@ -230,93 +235,186 @@ impl VarianceAccumulator {
         self.count
     }
 
-    pub fn get_mean(&self) -> ScalarValue {
-        self.mean.clone()
+    pub fn get_mean(&self) -> f64 {
+        self.mean
     }
 
-    pub fn get_m2(&self) -> ScalarValue {
-        self.m2.clone()
+    pub fn get_m2(&self) -> f64 {
+        self.m2
     }
 }
 
 impl Accumulator for VarianceAccumulator {
     fn state(&self) -> Result<Vec<ScalarValue>> {
         Ok(vec![
             ScalarValue::from(self.count),
-            self.mean.clone(),
-            self.m2.clone(),
+            ScalarValue::from(self.mean),
+            ScalarValue::from(self.m2),
         ])
     }
 
+    fn update_batch(&mut self, values: &[ArrayRef]) -> Result<()> {
+        let values = &cast(&values[0], &DataType::Float64)?;
+        let arr = values.as_any().downcast_ref::<Float64Array>().unwrap();
+
+        for i in 0..arr.len() {
+            let value = arr.value(i);
+
+            if value == 0_f64 && values.is_null(i) {
+                continue;
+            }
+            let new_count = self.count + 1;

Review comment:
       After some investigation, this approach does work as expected. The 
reason for the null check is because `downcast_ref` replace the `None` values 
into `0_f64` so we need to check in the original array when a 0 is observed. 
The proposed code checks the array after the type cast so it can't catch the 
nulls. I tried to find a good way to do similar things on the original array 
but yet to have any luck. I will dig a bit deeper later, please let me know if 
you a way to achieve this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to