samuelcolvin commented on code in PR #10468: URL: https://github.com/apache/datafusion/pull/10468#discussion_r1597704992
########## datafusion/physical-plan/src/common.rs: ########## @@ -153,16 +153,23 @@ pub fn compute_record_batch_statistics( }) .sum(); - let mut column_statistics = vec![ColumnStatistics::new_unknown(); projection.len()]; + let mut null_counts = vec![0; projection.len()]; for partition in batches.iter() { for batch in partition { for (stat_index, col_index) in projection.iter().enumerate() { - column_statistics[stat_index].null_count = - Precision::Exact(batch.column(*col_index).null_count()); + null_counts[stat_index] += batch.column(*col_index).null_count(); Review Comment: There would be fewer bounds checks i think if we used `zip` here, rather than `.enumerate()` and lookup each column -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org