ozankabak commented on code in PR #14699:
URL: https://github.com/apache/datafusion/pull/14699#discussion_r1966506841


##########
datafusion/physical-expr-common/src/physical_expr.rs:
##########
@@ -144,6 +153,111 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + 
DynEq + DynHash {
         Ok(Some(vec![]))
     }
 
+    /// Computes the output statistics for the expression, given the input
+    /// statistics.
+    ///
+    /// # Parameters
+    ///
+    /// * `children` are the statistics for the children (inputs) of this
+    ///   expression.
+    ///
+    /// # Returns
+    ///
+    /// A `Result` containing the output statistics for the expression in
+    /// case of success, or an error object in case of failure.
+    ///
+    /// Expressions (should) implement this function and utilize the 
independence
+    /// assumption, match on children distribution types and compute the output
+    /// statistics accordingly. The default implementation simply creates an
+    /// unknown output distribution by combining input ranges. This logic loses
+    /// distribution information, but is a safe default.
+    fn evaluate_statistics(&self, children: &[&StatisticsV2]) -> 
Result<StatisticsV2> {

Review Comment:
   >  if it is only meant to represent distributions of values
   
   This is indeed the case. It will replace `Precision` in the current code.
   
   The hierarchy we had in mind was
   1. `Statistics(V2)`: Represents statistical information (e.g. distribution, 
mean, variance) of a single (possibly unknown) value or an estimate. This is 
the focus of this PR, which provides the baseline mechanism to evaluate this 
for arbitrary expressions.
   2. `ColumnStatistics`: It will collect a bunch of `Statistics(V2)` objects 
that represent estimations about the population of values in a column; e.g. its 
maximum value, average etc.
   3. `TableStatistics`: Similar to 2, but for relations.
   
   Revamp of the current implementations of 2 and 3, based on 1, will be the 
focus of subsequent PRs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to