Fly-Style opened a new pull request, #14699: URL: https://github.com/apache/datafusion/pull/14699
## Rationale for this change The Statistics framework in Datafusion is a foundational component for query planning and execution. It provides metadata about datasets, enabling optimization decisions and influencing runtime behaviors. This patch comprehensively redesigns the Statistics representation by transitioning to an enum-based structure that supports multiple distribution types, offering greater flexibility and expressiveness. ## What changes are included in this PR? This patch presents a Statistics v.2 framework with the following main points: - introduction to enum-based struct to support multiple distribution types, which initially include: - Uniform distribution (interval) - Gaussian distribution, parametrized with `mean` and `variance` - Exponential distribution, parametrized with `rate` and `offset` - Bernoulli distribution - holds probability, is used as the resulting distribution of comparison operators, - Unknown distribution, which abstracts any non-represented distribution, or is used as a fallback option. - revamps a tree-based interval evaluation and propagation for a new statistics framework, still keeping old statistics in the codebase, with support of most useful binary, unary `negate` and logical `not` operators. - introduces and extends existing `interval_arithmetic` methods; ## Are these changes tested? Yes, these changes are tested mostly with unit tests, and also with one integration test. P.S. Despite myself opening a PR, there was a huge effort from @berkaysynnada and @ozankabak to shape the state of this change. I want to express a huge gratitude to them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org