Fly-Style opened a new pull request, #14699:
URL: https://github.com/apache/datafusion/pull/14699
## Rationale for this change
The Statistics framework in Datafusion is a foundational component for query
planning and execution. It provides metadata about datasets, enabling
optimization decisions and influencing runtime behaviors. This patch
comprehensively redesigns the Statistics representation by transitioning to an
enum-based structure that supports multiple distribution types, offering
greater flexibility and expressiveness.
## What changes are included in this PR?
This patch presents a Statistics v.2 framework with the following main
points:
- introduction to enum-based struct to support multiple distribution types,
which initially include:
- Uniform distribution (interval)
- Gaussian distribution, parametrized with `mean` and `variance`
- Exponential distribution, parametrized with `rate` and `offset`
- Bernoulli distribution - holds probability, is used as the resulting
distribution of comparison operators,
- Unknown distribution, which abstracts any non-represented distribution,
or is used as a fallback option.
- revamps a tree-based interval evaluation and propagation for a new
statistics framework, still keeping old statistics in the codebase, with
support of most useful binary, unary `negate` and logical `not` operators.
- introduces and extends existing `interval_arithmetic` methods;
## Are these changes tested?
Yes, these changes are tested mostly with unit tests, and also with one
integration test.
P.S. Despite myself opening a PR, there was a huge effort from
@berkaysynnada and @ozankabak to shape the state of this change. I want to
express a huge gratitude to them.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]