alamb commented on code in PR #21122:
URL: https://github.com/apache/datafusion/pull/21122#discussion_r3204292651


##########
datafusion/core/src/physical_planner.rs:
##########
@@ -1097,12 +1098,12 @@ impl DefaultPhysicalPlanner {
                     input_schema.as_arrow(),
                 )? {
                     PlanAsyncExpr::Sync(PlannedExprResult::Expr(runtime_expr)) 
=> {
-                        FilterExecBuilder::new(
+                        let builder = FilterExecBuilder::new(

Review Comment:
   a nit is that these changes seem unrelated 



##########
datafusion/physical-expr/src/projection.rs:
##########
@@ -125,12 +126,22 @@ impl From<ProjectionExpr> for (Arc<dyn PhysicalExpr>, 
String) {
 ///
 /// See [`ProjectionExprs::from_indices`] to select a subset of columns by
 /// indices.
-#[derive(Debug, Clone, PartialEq, Eq)]
+#[derive(Debug, Clone)]
 pub struct ProjectionExprs {
     /// [`Arc`] used for a cheap clone, which improves physical plan 
optimization performance.
     exprs: Arc<[ProjectionExpr]>,
+    /// Optional expression analyzer registry for statistics estimation

Review Comment:
   I think this is basically the same thing @xudong963  is saying in this 
comment: 
   - 
https://github.com/apache/datafusion/pull/21122#pullrequestreview-4145718306



##########
datafusion/common/src/config.rs:
##########
@@ -1131,6 +1131,11 @@ config_namespace! {
         /// So if you disable `enable_topk_dynamic_filter_pushdown`, then 
enable `enable_dynamic_filter_pushdown`, the 
`enable_topk_dynamic_filter_pushdown` will be overridden.
         pub enable_dynamic_filter_pushdown: bool, default = true
 
+        /// When set to true, the pluggable `ExpressionAnalyzerRegistry` from
+        /// `SessionState` is used for expression-level statistics estimation
+        /// (NDV, selectivity, min/max, null fraction) in physical plan 
operators.
+        pub use_expression_analyzer: bool, default = false

Review Comment:
   I wonder why we need a new flag? It seems like in an ideal world, we would 
add a new extension API but then refactor the existing code so it used the new 
extension API (but kept the existing behavior)



##########
datafusion/physical-expr/src/projection.rs:
##########
@@ -125,12 +126,22 @@ impl From<ProjectionExpr> for (Arc<dyn PhysicalExpr>, 
String) {
 ///
 /// See [`ProjectionExprs::from_indices`] to select a subset of columns by
 /// indices.
-#[derive(Debug, Clone, PartialEq, Eq)]
+#[derive(Debug, Clone)]
 pub struct ProjectionExprs {
     /// [`Arc`] used for a cheap clone, which improves physical plan 
optimization performance.
     exprs: Arc<[ProjectionExpr]>,
+    /// Optional expression analyzer registry for statistics estimation

Review Comment:
   If feels akward to me that the ProjectionExprs has an expression analyzer on 
it, as that expression analyzer seems like it is really there to be passed into 
a call to StatisticsProvider::compute_statistics
   
   In other words, there is one `ExpressionAnalyzerRegistry`  per plan, but but 
putting it on fields it look like it could be per plan node. In fact, this 
isn't even a plan node (it is a field on a plan node)
   
   I also think that having to set the fields correctly means there is a  
(real) a danger that it won't be plumbed through properly in the future
   
   It seems to me like a better design  would be to pass the 
`ExpressionAnalyzerRegistry` down in the callsite where it is needed -- for 
example how about adding it as a method on the 
StatisticsProvider::compute_statistics? That would ensure it is always passed 
where needed and it would remove a lot of the boiler plate in this PR
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to