realno commented on a change in pull request #1547: URL: https://github.com/apache/arrow-datafusion/pull/1547#discussion_r782594743
########## File path: datafusion/src/physical_plan/expressions/variance.rs ########## @@ -230,93 +235,186 @@ impl VarianceAccumulator { self.count } - pub fn get_mean(&self) -> ScalarValue { - self.mean.clone() + pub fn get_mean(&self) -> f64 { + self.mean } - pub fn get_m2(&self) -> ScalarValue { - self.m2.clone() + pub fn get_m2(&self) -> f64 { + self.m2 } } impl Accumulator for VarianceAccumulator { fn state(&self) -> Result<Vec<ScalarValue>> { Ok(vec![ ScalarValue::from(self.count), - self.mean.clone(), - self.m2.clone(), + ScalarValue::from(self.mean), + ScalarValue::from(self.m2), ]) } + fn update_batch(&mut self, values: &[ArrayRef]) -> Result<()> { + let values = &cast(&values[0], &DataType::Float64)?; + let arr = values.as_any().downcast_ref::<Float64Array>().unwrap(); + + for i in 0..arr.len() { + let value = arr.value(i); + + if value == 0_f64 && values.is_null(i) { + continue; + } + let new_count = self.count + 1; Review comment: After some investigation, this approach does work as expected. The reason for the null check is because `downcast_ref` replace the `None` values into `0_f64` so we need to check in the original array when a 0 is observed. The proposed code checks the array after the type cast so it can't catch the nulls. I tried to find a good way to do similar things on the original array but yet to have any luck. I will dig a bit deeper later, please let me know if you a way to achieve this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org