ozankabak commented on issue #8078:
URL: https://github.com/apache/datafusion/issues/8078#issuecomment-2551052253

   ```rust
   let input_intervals: Vec<&Interval> = ....;
   // wrap input intervals with Statistics
   let temp_statistics = Statistics::new_from_bounds(&input_intervals);
   // compute output column statistics
   let output_column_statistics = expr.column_statistics(&temp_statistics)?;
   // use the output value if it was known
   let output_interval. = match output_column_statstics.value() {
     Precision::Absent | Precision::PointEstimation => None,
     Precision::Interval(interval) => interval
   };
   ```
   
   This usage is weird in contexts where the concept of statistics isn't even 
applicable. I agree that it will work, but only so because we are forcing. I 
think the right pattern is to have `column_statistics` simply use 
`evaluate_bounds` as a subroutine for computing hard bounds (and any other 
information of type (2) in my comment). For example, it would be quite natural 
for us to have have things like `evaluate_probability` (not a great name!) that 
also takes expressions and does some sort of probabilistic computation (maybe 
PDF related). Then, `column_statistics` would also use `evaluate_probability` 
as a subroutine.
   
   So I see something like `column_statistics` as a more general API that 
defers to lower level APIs to collect information.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to