ozankabak commented on code in PR #4852:
URL: https://github.com/apache/arrow-datafusion/pull/4852#discussion_r1064177929
##########
datafusion/physical-expr/src/aggregate/correlation.rs:
##########
@@ -341,48 +365,44 @@ mod tests {
#[test]
fn correlation_i32_with_nulls_2() -> Result<()> {
Review Comment:
Thank you for adding this kind of test both here and in sqllogictests. We
hopefully will not have a regression on this in the future.
##########
datafusion/physical-expr/src/aggregate/covariance.rs:
##########
@@ -253,33 +253,33 @@ impl Accumulator for CovarianceAccumulator {
let mut arr1 = downcast_value!(values1, Float64Array).iter().flatten();
let mut arr2 = downcast_value!(values2, Float64Array).iter().flatten();
- for _i in 0..values1.len() {
- let value1 = arr1.next();
- let value2 = arr2.next();
+ for i in 0..values1.len() {
+ let value1 = match values1.is_valid(i) {
+ true => arr1.next(),
+ false => None,
+ };
+ let value2 = match values2.is_valid(i) {
+ true => arr2.next(),
+ false => None,
+ };
Review Comment:
Given you are matching on a boolean value, an if statement would be slightly
more idiomatic.
```suggestion
let value1 = if values1.is_valid(i) {
arr1.next()
} else {
None
};
let value2 = if values2.is_valid(i) {
arr2.next()
} else {
None
};
```
##########
datafusion/physical-expr/src/aggregate/covariance.rs:
##########
@@ -291,19 +291,20 @@ impl Accumulator for CovarianceAccumulator {
let mut arr1 = downcast_value!(values1, Float64Array).iter().flatten();
let mut arr2 = downcast_value!(values2, Float64Array).iter().flatten();
- for _i in 0..values1.len() {
- let value1 = arr1.next();
- let value2 = arr2.next();
+ for i in 0..values1.len() {
+ let value1 = match values1.is_valid(i) {
+ true => arr1.next(),
+ false => None,
+ };
+ let value2 = match values2.is_valid(i) {
+ true => arr2.next(),
+ false => None,
+ };
if value1.is_none() || value2.is_none() {
Review Comment:
Similar to the comment above, if we had Rust 1.65 I would use the `let-else`
construct to get rid of the unwraps.
##########
datafusion/physical-expr/src/aggregate/correlation.rs:
##########
@@ -145,14 +149,34 @@ impl Accumulator for CorrelationAccumulator {
}
fn update_batch(&mut self, values: &[ArrayRef]) -> Result<()> {
- self.covar.update_batch(values)?;
+ let values = if values[0].null_count() != 0 || values[1].null_count()
!= 0 {
Review Comment:
Good catch -- we indeed need to check for nulls jointly here. However, it
kind of bothers me that we will be rechecking for nulls (and for naught) in the
children accumulators. Maybe, in the future, we can write a more efficient
in-line accumulator for this that does not use any children so that this
inefficiency goes away.
##########
datafusion/physical-expr/src/aggregate/covariance.rs:
##########
@@ -291,19 +291,20 @@ impl Accumulator for CovarianceAccumulator {
let mut arr1 = downcast_value!(values1, Float64Array).iter().flatten();
let mut arr2 = downcast_value!(values2, Float64Array).iter().flatten();
- for _i in 0..values1.len() {
- let value1 = arr1.next();
- let value2 = arr2.next();
+ for i in 0..values1.len() {
+ let value1 = match values1.is_valid(i) {
+ true => arr1.next(),
+ false => None,
+ };
+ let value2 = match values2.is_valid(i) {
+ true => arr2.next(),
+ false => None,
+ };
Review Comment:
Same as above.
```suggestion
let value1 = if values1.is_valid(i) {
arr1.next()
} else {
None
};
let value2 = if values2.is_valid(i) {
arr2.next()
} else {
None
};
```
##########
datafusion/physical-expr/src/aggregate/covariance.rs:
##########
@@ -253,33 +253,33 @@ impl Accumulator for CovarianceAccumulator {
let mut arr1 = downcast_value!(values1, Float64Array).iter().flatten();
let mut arr2 = downcast_value!(values2, Float64Array).iter().flatten();
- for _i in 0..values1.len() {
- let value1 = arr1.next();
- let value2 = arr2.next();
+ for i in 0..values1.len() {
+ let value1 = match values1.is_valid(i) {
+ true => arr1.next(),
+ false => None,
+ };
+ let value2 = match values2.is_valid(i) {
+ true => arr2.next(),
+ false => None,
+ };
if value1.is_none() || value2.is_none() {
- if value1.is_none() && value2.is_none() {
- continue;
- } else {
- return Err(DataFusionError::Internal(
- "The two columns are not aligned".to_string(),
- ));
- }
- } else {
- let value1 = unwrap_or_internal_err!(value1);
- let value2 = unwrap_or_internal_err!(value2);
- let new_count = self.count + 1;
- let delta1 = value1 - self.mean1;
- let new_mean1 = delta1 / new_count as f64 + self.mean1;
- let delta2 = value2 - self.mean2;
- let new_mean2 = delta2 / new_count as f64 + self.mean2;
- let new_c = delta1 * (value2 - new_mean2) + self.algo_const;
-
- self.count += 1;
- self.mean1 = new_mean1;
- self.mean2 = new_mean2;
- self.algo_const = new_c;
+ continue;
}
+
+ let value1 = unwrap_or_internal_err!(value1);
+ let value2 = unwrap_or_internal_err!(value2);
Review Comment:
Had we migrated to Rust 1.65, I would suggest the much simpler:
```
let Some(value1) = value1 else { continue };
let Some(value2) = value2 else { continue };
```
@alamb, any information as to when we will make the switch?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]