asolimando commented on code in PR #19957:
URL: https://github.com/apache/datafusion/pull/19957#discussion_r2736292455
##########
datafusion/physical-expr/src/projection.rs:
##########
@@ -660,9 +660,25 @@ impl ProjectionExprs {
}
}
} else {
- // TODO stats: estimate more statistics from expressions
- // (expressions should compute their statistics themselves)
- ColumnStatistics::new_unknown()
+ // TODO: expressions should compute their own statistics
Review Comment:
The new constructs are an extensibility layer that builds on top of the
existing methods rather than replacing them.
For instance, the `DefaultStatisticsProvider` will simply delegate to
`partition_statistics()`.
The key difference is that by adopting
`StatisticsProvider`/`StatisticsRegistry` pattern, the user can:
- Register custom providers for custom operators without trait changes
- Override default estimation for specific (even built-in) operators
- Carry extended statistics (histograms, sketches) through
`ExtendedStatistics`
Similarly, `PhysicalExpr::evaluate_statistics()` will be the default for
`ExpressionAnalyzer`, but it will allow to override the behavior for built-in
expressions, and it will easily provide an extension point for user-defined
functions.
I totally understand the concern about the blast radius, but if there is
consensus on doubling down on statistics and CBO (by making them more effective
in driving planning decisions, when stats are available), it will become
necessary to provide flexibility to override and extend for people implementing
on top of DataFusion, to steer planning decisions as the user sees fit.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]