crepererum commented on PR #7793: URL: https://github.com/apache/arrow-datafusion/pull/7793#issuecomment-1764191447
I like the idea of "precision", however I have a few concerns regarding the current implementation: - **inexact:** What does "inexact" provide if you don't have ANY bounds whatsoever? I think this only makes sense if you can specify if the "inexact" guess is a at least also an upper/lower bound of the possible values. E.g. "row counts" can be an inexact upper bound if you have a filter but you just pass through the counts of the infiltered data. So I propose - **constraints:** If you have min+max value, I think you rarely want a totally random guess but they probably should be ends of a range and constraint each other. Also see point above. With that, I propose to change the `Precision` enum to: - `Exact`: speaks for itself, might be rare for floats though - `UpperBound`: The actual value is `<=` this given bound. - `LowerBound`: The actual value is `>=` this given bound. - `Absent`: speaks for itself For min/max value this probably leads to `min=LowerBound(...)` and `max=UpperBound(...)` in many cases which also nicely fits into the interval analysis that also exists in the DF code base already. A more drastic (but maybe better?) alternative is to keep `Inexact` but instead of filling it with `T`, fill it with [`Interval`](https://docs.rs/datafusion/latest/datafusion/physical_expr/intervals/struct.Interval.html). This provides you bounds and we already have a bunch of code around that type. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
