berkaysynnada commented on PR #14074: URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2609448004
> Statistics can be helpful for optimizer rules, but they also allow short-circuiting computations. For example, min/max can be used to avoid evaluating a filter over a record batch and quickly throw away the whole thing. > > Sum statistics help with short-circuiting aggregation functions. For example, `SELECT SUM(a) FROM foo` becomes a constant time operation. Similarly, `AVG(a)` can be computed with `sum / row count`. > > > Why cannot you just use an AggregateExec having a sum accumulator? > > Because our file format already stores a pre-computed sum. Thanks for the explanation. I see the reason now, and it makes sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org