rdettai commented on pull request #965:
URL: https://github.com/apache/arrow-datafusion/pull/965#issuecomment-917665454


   Thanks a lot @alamb for this very thorough review.
   
   There is one recurring remark throughout the PR regarding the estimations of 
statistics in the case where statistics are not exact. I completely agree with 
all of these, most importantly:
   - per field definition of exact/inexact would be more flexible
   - we could specify much more accurately what inexact means (histograms, % of 
error, value interval...), and thus propagate some very interesting information
   
   Obviously both will be topics for further work and more importantly, it 
would require a very thorough analysis to avoid bloating the codebase with very 
complex arithmetic that is actually never used by any optimization rule (or 
rules that never kick in). One example for this currently is the 
`total_byte_size` that I have ported from the previous `Statistics` 
abstraction, but that is often tricky to compute and **that is currently never 
used**.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to