alamb commented on code in PR #14699:
URL: https://github.com/apache/datafusion/pull/14699#discussion_r1966502387


##########
datafusion/physical-expr-common/src/physical_expr.rs:
##########
@@ -144,6 +153,111 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + 
DynEq + DynHash {
         Ok(Some(vec![]))
     }
 
+    /// Computes the output statistics for the expression, given the input
+    /// statistics.
+    ///
+    /// # Parameters
+    ///
+    /// * `children` are the statistics for the children (inputs) of this
+    ///   expression.
+    ///
+    /// # Returns
+    ///
+    /// A `Result` containing the output statistics for the expression in
+    /// case of success, or an error object in case of failure.
+    ///
+    /// Expressions (should) implement this function and utilize the 
independence
+    /// assumption, match on children distribution types and compute the output
+    /// statistics accordingly. The default implementation simply creates an
+    /// unknown output distribution by combining input ranges. This logic loses
+    /// distribution information, but is a safe default.
+    fn evaluate_statistics(&self, children: &[&StatisticsV2]) -> 
Result<StatisticsV2> {

Review Comment:
   I did some more research into the current code, which has:
   1.  
[`Statistics`](https://docs.rs/datafusion/latest/datafusion/common/struct.Statistics.html)
 has table level statistics, such as statistics for columns and the row count 
and distinct count
   2. 
[`ColumnStatistics`](https://docs.rs/datafusion/latest/datafusion/common/struct.ColumnStatistics.html)
 which has column level statistics
   
   In this PR
   1. `[&StatisticsV2]` is equivalent to `Statistics` ( distribution of 
multiple columns)
   2. `StatisticsV2` is equivalent to `ColumnStatistics` (distribution of a 
single column)
   
   In order to have the names be consistent, I recommend:
   1. Renaming `StatisticsV2` to `ColumnStatisticsV2`
   2. Introducing `StatisticsV2` that holds a set of column statistics
   
   UPDATE -- I think calling this `Distribution` might more accurately describe 
what it is trying to do
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to