berkaysynnada commented on PR #14074:
URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2609448004

   > Statistics can be helpful for optimizer rules, but they also allow 
short-circuiting computations. For example, min/max can be used to avoid 
evaluating a filter over a record batch and quickly throw away the whole thing.
   > 
   > Sum statistics help with short-circuiting aggregation functions. For 
example, `SELECT SUM(a) FROM foo` becomes a constant time operation. Similarly, 
`AVG(a)` can be computed with `sum / row count`.
   > 
   > > Why cannot you just use an AggregateExec having a sum accumulator?
   > 
   > Because our file format already stores a pre-computed sum.
   
   Thanks for the explanation. I see the reason now, and it makes sense. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to