appletreeisyellow commented on code in PR #9223:
URL: https://github.com/apache/arrow-datafusion/pull/9223#discussion_r1525632419
##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -903,6 +1004,42 @@ impl<'a> PruningExpressionBuilder<'a> {
self.required_columns
.max_column_expr(&self.column, &self.column_expr, self.field)
}
+
+ /// Note that this function intentionally overwrites the column expression
to [`phys_expr::Column`].
+ /// i.e. expressions like [`phys_expr::CastExpr`] or
[`phys_expr::TryCastExpr`] will be overwritten.
+ ///
+ /// This is to avoid cases like `cast(x_null_count)` or
`try_cast(x_null_count)`.
Review Comment:
If the query predicate is `WHERE cast(x AS data_type)`, the original rewrite
will turn `x_min` into `cast(x_min)` and `x_max` into `cast(x_max AS
data_type)`. Same for `try_cast()`. See here:
https://github.com/appletreeisyellow/arrow-datafusion/blob/chunchun/pruning-predicate-column-known-tobe-null/datafusion/core/src/physical_optimizer/pruning.rs#L1075-L1101
We don't want `cast()` and `try_cast()` to rewrite `x_null_count` into
`cast(x_null_count AS data_type)`, so this step is to avoid `cast(x_null_count
AS data_type)` from happening
##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -903,6 +1004,42 @@ impl<'a> PruningExpressionBuilder<'a> {
self.required_columns
.max_column_expr(&self.column, &self.column_expr, self.field)
}
+
+ /// Note that this function intentionally overwrites the column expression
to [`phys_expr::Column`].
+ /// i.e. expressions like [`phys_expr::CastExpr`] or
[`phys_expr::TryCastExpr`] will be overwritten.
+ ///
+ /// This is to avoid cases like `cast(x_null_count)` or
`try_cast(x_null_count)`.
+ fn null_count_column_expr(&mut self) -> Result<Arc<dyn PhysicalExpr>> {
+ // overwrite to [`phys_expr::Column`]
+ let column_expr = Arc::new(self.column.clone()) as _;
+
+ // null_count is DataType::UInt64, which is different from the
column's data type (i.e. self.field)
+ let null_count_field = &Field::new(self.field.name(),
DataType::UInt64, true);
+
+ self.required_columns.null_count_column_expr(
+ &self.column,
+ &column_expr,
+ null_count_field,
+ )
+ }
+
+ /// Note that this function intentionally overwrites the column expression
to [`phys_expr::Column`].
Review Comment:
See
https://github.com/apache/arrow-datafusion/pull/9223#discussion_r1525632419
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]