asolimando commented on code in PR #19957:
URL: https://github.com/apache/datafusion/pull/19957#discussion_r2736292455


##########
datafusion/physical-expr/src/projection.rs:
##########
@@ -660,9 +660,25 @@ impl ProjectionExprs {
                     }
                 }
             } else {
-                // TODO stats: estimate more statistics from expressions
-                // (expressions should compute their statistics themselves)
-                ColumnStatistics::new_unknown()
+                // TODO: expressions should compute their own statistics

Review Comment:
   The new constructs are an extensibility layer that builds on top of the 
existing methods rather than replacing them.
   
   For instance, the `DefaultStatisticsProvider` will simply delegate to 
`partition_statistics()`.
   
   The key difference is that by adopting 
`StatisticsProvider`/`StatisticsRegistry` pattern, the user can:
   - Register custom providers for custom operators without trait changes
   - Override default estimation for specific (even built-in) operators
   - Carry extended statistics (histograms, sketches) through 
`ExtendedStatistics`
   
   Similarly, `PhysicalExpr::evaluate_statistics()` will be the default for 
`ExpressionAnalyzer`, but it will allow to override the behavior for built-in 
expressions, and it will easily provide an extension point for user-defined 
functions.
   
   I totally understand the concern about the blast radius, but if there is 
consensus on doubling down on statistics and CBO (by making them more effective 
in driving planning decisions, when stats are available), it will become 
necessary to provide flexibility to override and extend for people implementing 
on top of DataFusion, to steer planning decisions as the user sees fit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to