rdettai commented on pull request #965: URL: https://github.com/apache/arrow-datafusion/pull/965#issuecomment-917665454
Thanks a lot @alamb for this very thorough review. There is one recurring remark throughout the PR regarding the estimations of statistics in the case where statistics are not exact. I completely agree with all of these, most importantly: - per field definition of exact/inexact would be more flexible - we could specify much more accurately what inexact means (histograms, % of error, value interval...), and thus propagate some very interesting information Obviously both will be topics for further work and more importantly, it would require a very thorough analysis to avoid bloating the codebase with very complex arithmetic that is actually never used by any optimization rule (or rules that never kick in). One example for this currently is the `total_byte_size` that I have ported from the previous `Statistics` abstraction, but that is often tricky to compute and **that is currently never used**. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
