alamb commented on PR #13293: URL: https://github.com/apache/datafusion/pull/13293#issuecomment-2462820541
> Without knowing too much about the use case for inexact statistics, is it possible we may need _both_ inexact and "precise" upper/lower bounds for column statistics? I.e. a tight, inexact lower/upper bound, and then a looser "real" upper & lower bound . > > I can see this causing tension between parts of the codebase that benefit from tighter but inexact bounds and parts that benefit from having correct bounds. I am also not super sure about the usecase for inexact statistics. I think there was some idea that knowing a value was likely close to 1M would be more helpful than simply discarding the values. However, almost all the operations I can think of (filtering, limit, aggregation) don't make the output range larger than the input. Maybe could consider simply removing `Precision::Inexact` entirely 🤔 So we would only have ```rust Precision { Exact, AtMost, AtLeast, Unknown } ``` I still do feel like having `Precision::Bounded` would be ideal to reuse all the existing `Interval` logic but that feels like too large a change to me. But maybe not I wonder if @berkaysynnada has any thoughts or insights? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org