alamb commented on PR #13293:
URL: https://github.com/apache/datafusion/pull/13293#issuecomment-2462820541

   
   > Without knowing too much about the use case for inexact statistics, is it 
possible we may need _both_ inexact and "precise" upper/lower bounds for column 
statistics? I.e. a tight, inexact lower/upper bound, and then a looser "real" 
upper & lower bound .
   > 
   > I can see this causing tension between parts of the codebase that benefit 
from tighter but inexact bounds and parts that benefit from having correct 
bounds.
   
   
   I am also not super sure about the usecase for inexact statistics. I think 
there was some idea that knowing a value was likely close to 1M would be more 
helpful than simply discarding the values.
   
   However, almost all the operations I can think of (filtering, limit, 
aggregation) don't make the output range larger than the input. 
   
   Maybe could consider simply removing `Precision::Inexact` entirely 🤔 So we 
would only have
   
   ```rust
   Precision {
     Exact,
     AtMost,
     AtLeast,
     Unknown 
   }
   ```
   
   I still do feel like having `Precision::Bounded` would be ideal to reuse all 
the existing `Interval` logic but that feels like too large a change to me. But 
maybe not
   
   I wonder if @berkaysynnada  has any thoughts or insights?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to