ozankabak commented on issue #8078:
URL: https://github.com/apache/datafusion/issues/8078#issuecomment-2551052253
```rust
let input_intervals: Vec<&Interval> = ....;
// wrap input intervals with Statistics
let temp_statistics = Statistics::new_from_bounds(&input_intervals);
// compute output column statistics
let output_column_statistics = expr.column_statistics(&temp_statistics)?;
// use the output value if it was known
let output_interval. = match output_column_statstics.value() {
Precision::Absent | Precision::PointEstimation => None,
Precision::Interval(interval) => interval
};
```
This usage is weird in contexts where the concept of statistics isn't even
applicable. I agree that it will work, but only so because we are forcing. I
think the right pattern is to have `column_statistics` simply use
`evaluate_bounds` as a subroutine for computing hard bounds (and any other
information of type (2) in my comment). For example, it would be quite natural
for us to have have things like `evaluate_probability` (not a great name!) that
also takes expressions and does some sort of probabilistic computation (maybe
PDF related). Then, `column_statistics` would also use `evaluate_probability`
as a subroutine.
So I see something like `column_statistics` as a more general API that
defers to lower level APIs to collect information.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]