Fly-Style opened a new pull request, #14699:
URL: https://github.com/apache/datafusion/pull/14699

   ## Rationale for this change
   
   The Statistics framework in Datafusion is a foundational component for query 
planning and execution. It provides metadata about datasets, enabling 
optimization decisions and influencing runtime behaviors. This patch 
comprehensively redesigns the Statistics representation by transitioning to an 
enum-based structure that supports multiple distribution types, offering 
greater flexibility and expressiveness.
   
   ## What changes are included in this PR?
   
   This patch presents a Statistics v.2 framework with the following main 
points:
   - introduction to enum-based struct to support multiple distribution types, 
which initially include:
     - Uniform distribution (interval)
     - Gaussian distribution, parametrized with `mean` and `variance`
     - Exponential distribution, parametrized with `rate` and `offset`
     - Bernoulli distribution - holds probability, is used as the resulting 
distribution of comparison operators,
     - Unknown distribution, which abstracts any non-represented distribution, 
or is used as a fallback option.
   - revamps a tree-based interval evaluation and propagation for a new 
statistics framework, still keeping old statistics in the codebase, with 
support of most useful binary,  unary `negate` and logical `not` operators.
   - introduces and extends existing `interval_arithmetic` methods;
   
   ## Are these changes tested?
   
   Yes, these changes are tested mostly with unit tests, and also with one 
integration test.
   
   P.S. Despite myself opening a PR, there was a huge effort from 
@berkaysynnada and @ozankabak to shape the state of this change. I want to 
express a huge gratitude to them.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to