[GitHub] [arrow-datafusion] matthewmturner commented on a change in pull request #1063: Optimize count agg expr with null column statistics

GitBox Thu, 30 Sep 2021 07:49:20 -0700


matthewmturner commented on a change in pull request #1063:
URL: https://github.com/apache/arrow-datafusion/pull/1063#discussion_r719485303




##########
File path: datafusion/src/physical_optimizer/aggregate_statistics.rs
##########
@@ -153,6 +159,37 @@ fn take_optimizable_count(
     None
 }
 
+fn take_optimizable_count_with_nulls(
+    agg_expr: &dyn AggregateExpr,
+    stats: &Statistics,
+) -> Option<(ScalarValue, String)> {
+    if let (Some(num_rows), Some(col_stats), Some(casted_expr)) = (
+        stats.num_rows,
+        &stats.column_statistics,
+        agg_expr.as_any().downcast_ref::<expressions::Min>(),
+    ) {
+        if casted_expr.expressions().len() == 1 {
+            // TODO optimize with exprs other than Column
+            if let Some(col_expr) = casted_expr.expressions()[0]
+                .as_any()
+                .downcast_ref::<expressions::Column>()
+            {
+                if let ColumnStatistics {
+                    null_count: Some(val),
+                    ..
+                } = &col_stats[col_expr.index()]
+                {
+                    return Some((
+                        ScalarValue::UInt64(Some((num_rows - val) as u64)),
+                        "COUNT(Uint8(1))".to_string(),

Review comment:
       My naive interpretation was that UInt8(1) was a required placeholder to 
go with the `num_rows` value in the line before it.  Meaning if the column was 
put there then it would actually perform the count calculation - which we dont 
want since were using stats.
   
   Would def be good to get a better explanation though!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] matthewmturner commented on a change in pull request #1063: Optimize count agg expr with null column statistics

Reply via email to