berkaysynnada commented on code in PR #14699:
URL: https://github.com/apache/datafusion/pull/14699#discussion_r1963780732


##########
datafusion/physical-expr-common/src/physical_expr.rs:
##########
@@ -144,6 +153,111 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + 
DynEq + DynHash {
         Ok(Some(vec![]))
     }
 
+    /// Computes the output statistics for the expression, given the input
+    /// statistics.
+    ///
+    /// # Parameters
+    ///
+    /// * `children` are the statistics for the children (inputs) of this
+    ///   expression.
+    ///
+    /// # Returns
+    ///
+    /// A `Result` containing the output statistics for the expression in
+    /// case of success, or an error object in case of failure.
+    ///
+    /// Expressions (should) implement this function and utilize the 
independence
+    /// assumption, match on children distribution types and compute the output
+    /// statistics accordingly. The default implementation simply creates an
+    /// unknown output distribution by combining input ranges. This logic loses
+    /// distribution information, but is a safe default.
+    fn evaluate_statistics(&self, children: &[&StatisticsV2]) -> 
Result<StatisticsV2> {
+        let children_ranges = children
+            .iter()
+            .map(|c| c.range())
+            .collect::<Result<Vec<_>>>()?;
+        let children_ranges_refs = children_ranges.iter().collect::<Vec<_>>();
+        let output_interval = 
self.evaluate_bounds(children_ranges_refs.as_slice())?;
+        let dt = output_interval.data_type();
+        if dt.eq(&DataType::Boolean) {
+            let p = if output_interval.eq(&Interval::CERTAINLY_TRUE) {
+                ScalarValue::new_one(&dt)
+            } else if output_interval.eq(&Interval::CERTAINLY_FALSE) {
+                ScalarValue::new_zero(&dt)
+            } else {
+                ScalarValue::try_from(&dt)
+            }?;
+            StatisticsV2::new_bernoulli(p)
+        } else {
+            StatisticsV2::new_from_interval(output_interval)
+        }
+    }
+
+    /// Updates children statistics using the given parent statistic for this
+    /// expression.
+    ///
+    /// This is used to propagate statistics down through an expression tree.
+    ///
+    /// # Parameters
+    ///
+    /// * `parent` is the currently known statistics for this expression.
+    /// * `children` are the current statistics for the children of this 
expression.
+    ///
+    /// # Returns
+    ///
+    /// A `Result` containing a `Vec` of new statistics for the children (in 
order)
+    /// in case of success, or an error object in case of failure.
+    ///
+    /// If statistics propagation reveals an infeasibility for any child, 
returns

Review Comment:
   Your suggestion makes sense. I was also planning to prepare a clean-up PR 
and to do some minor refactors to ease understanding and increase usability, 
and this issue is one of them. I'm waiting this PR to be merged.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to