alamb commented on issue #8078: URL: https://github.com/apache/arrow-datafusion/issues/8078#issuecomment-1810386490
I have been playing and studying this code. While the suggestion from @ozankabak and @berkaysynnada in https://github.com/apache/arrow-datafusion/issues/8078#issuecomment-1804546752 is very general and can represent many types of uncertainty in statistics, I haven't found cases yet where that full generality is important For example, I can't find (nor think of) an important case where the lower bound would be known with certainty and the upper bound was uncertain vs TYPE::MAX). Another example would be a use case where distinguishing between ranges like ``` min: `PointEstimate::Absent`, max: `PointEstimate::Precise(value)` min: PointEstimate::Precise(TYPE::MIN), max: PointEstimate::Precise(value) ``` Thus I am going to prototype what adding `Bounded` variant to `Precision` looks like. I also plan to encapsulate more of the checks into `Precision` so that if choose to go with a more general formulation we won't have to change as much of the rest of the code. ``` pub enum Precision<T: Debug + Clone + PartialEq + Eq + PartialOrd> { /// The exact value is known Exact(T), /// The exact value is not known, but the real value is known to be within /// the specified range: `lower <= value <= upper` TOOD: we could use /// `Interval` here instead, which could represent more complex cases (like /// open/closed bounds) Bounded { lower: T, upper: T}, /// The value is not known exactly, but is likely close to this value. /// NOTHING can assumed about the value for cor in this case. Inexact(T), /// Nothing is known about the value #[default] Absent, } ``` I'll report back here with how it goes shorty -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
