berkaysynnada commented on PR #14074:
URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2609417451

   > I can't think of any other statistical quantities that would immediately 
help operators, so from our perspective it's only "sum" (we may also use sum to 
mean true-count for booleans).
   > 
   > If this lands I can follow up with a PR to start using it in SUM, AVG 
operators. I guess the more contentious API change was adding 
`compute_statistics` to the `Expr` trait: 
https://github.com/apache/datafusion/pull/13736/files#diff-2b3f5563d9441d3303b57e58e804ab07a10d198973eed20e7751b5a20b955e42R156-R158
   > 
   > @berkaysynnada is this something that would also remain compatible with 
the V2 API? I believe it is
   
   
   What I know is the whole statistics concept was created and used because of 
helping some optimization decisions, informing the optimizer rules about the 
data that comes to any execution plan node. What I couldn't understand is how 
"sum" information is helpful in any kind of optimization process.
   
   > to start using it in SUM, AVG operators
   
   Please correct me if I get wrongly your intention within this and 
https://github.com/apache/datafusion/pull/13736, you propose to add this "sum" 
info to get a result from it as a normal batch data?
   
   As I said, the V2 API does nothing to which kind of statistics will be 
preserved in Statistics{} struct, it is more about consolidating the Precision 
and Interval objects to represent and compute any kind of statistical quantity.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to