berkaysynnada commented on PR #14074: URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2609417451
> I can't think of any other statistical quantities that would immediately help operators, so from our perspective it's only "sum" (we may also use sum to mean true-count for booleans). > > If this lands I can follow up with a PR to start using it in SUM, AVG operators. I guess the more contentious API change was adding `compute_statistics` to the `Expr` trait: https://github.com/apache/datafusion/pull/13736/files#diff-2b3f5563d9441d3303b57e58e804ab07a10d198973eed20e7751b5a20b955e42R156-R158 > > @berkaysynnada is this something that would also remain compatible with the V2 API? I believe it is What I know is the whole statistics concept was created and used because of helping some optimization decisions, informing the optimizer rules about the data that comes to any execution plan node. What I couldn't understand is how "sum" information is helpful in any kind of optimization process. > to start using it in SUM, AVG operators Please correct me if I get wrongly your intention within this and https://github.com/apache/datafusion/pull/13736, you propose to add this "sum" info to get a result from it as a normal batch data? As I said, the V2 API does nothing to which kind of statistics will be preserved in Statistics{} struct, it is more about consolidating the Precision and Interval objects to represent and compute any kind of statistical quantity. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org