alamb commented on PR #13293:
URL: https://github.com/apache/datafusion/pull/13293#issuecomment-2462820541
> Without knowing too much about the use case for inexact statistics, is it
possible we may need _both_ inexact and "precise" upper/lower bounds for column
statistics? I.e. a tight, inexact lower/upper bound, and then a looser "real"
upper & lower bound .
>
> I can see this causing tension between parts of the codebase that benefit
from tighter but inexact bounds and parts that benefit from having correct
bounds.
I am also not super sure about the usecase for inexact statistics. I think
there was some idea that knowing a value was likely close to 1M would be more
helpful than simply discarding the values.
However, almost all the operations I can think of (filtering, limit,
aggregation) don't make the output range larger than the input.
Maybe could consider simply removing `Precision::Inexact` entirely 🤔 So we
would only have
```rust
Precision {
Exact,
AtMost,
AtLeast,
Unknown
}
```
I still do feel like having `Precision::Bounded` would be ideal to reuse all
the existing `Interval` logic but that feels like too large a change to me. But
maybe not
I wonder if @berkaysynnada has any thoughts or insights?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]