alamb opened a new pull request, #22071:
URL: https://github.com/apache/datafusion/pull/22071

   ## Which issue does this PR close?
   
   - Related to #14896
   - Related to #21120
   
   ## Rationale for this change
   
   The "Statistics V2" framework introduced in #14699 (`Distribution` enum,
   `PhysicalExpr::evaluate_statistics`/`propagate_statistics`,
   `ExprStatisticsGraph`, and supporting helpers in
   `datafusion-expr-common::statistics`) was always intended as the foundation
   for replacing `Precision`-based column statistics with richer probabilistic
   distributions (see #14896). In practice it has never been wired into the
   optimizer or any execution operator.
   
   PR #14699 was merged on 2025-02-24, ~15 months ago. Since then, the only
   commits touching `datafusion/expr-common/src/statistics.rs` and
   `datafusion/physical-expr/src/statistics/` have been mechanical
   (Rust-edition migrations, lint-driven refactors, renames of unrelated
   `Interval` constants, removal of `as_any`) — no operator or planner has
   been taught to call `evaluate_statistics` / `propagate_statistics` or
   construct a `Distribution` outside of the framework's own tests.
   
   Meanwhile, #21120 lays out a different, simpler direction: a pluggable
   `ExpressionAnalyzer` chain-of-responsibility for expression-level
   statistics. That issue explicitly describes the V2 distribution-based API
   as "significantly more complex to implement and adopt" and proposes that
   distribution-based estimation, if useful, be plugged in later as a custom
   analyzer rather than as a `PhysicalExpr` trait surface.
   
   Rather than continue carrying an unused public framework that we don't
   intend to build on, this PR deprecates it so downstream users don't start
   building on top of it before it is removed.
   
   ## What changes are included in this PR?
   
   This PR adds `#[deprecated(since = "54.0.0", ...)]` attributes to the
   public abstractions introduced in #14699 — the `Distribution` enum and its
   variant structs, the `evaluate_statistics` / `propagate_statistics` trait
   methods on `PhysicalExpr`, `ExprStatisticsGraph` /
   `ExprStatisticsGraphNode`, and the supporting helper functions — and
   updates internal callers to silence the resulting deprecation warnings
   (with `#[expect(deprecated)]` or module-level `#![allow(deprecated)]` where
   appropriate). The deprecation note on each item points to #21120 for the
   new direction.
   
   No behavior changes; the V2 code paths still compile and run, so any
   out-of-tree consumer that has already adopted them sees a deprecation
   warning rather than a breakage.
   
   ## Are these changes tested?
   
   No new tests; the existing tests for the deprecated items continue to
   pass.
   
   ## Are there any user-facing changes?
   
   The public API items listed above are now marked `#[deprecated]`.
   Downstream code that uses them will see a compiler warning pointing to
   #21120, but will continue to compile and run unchanged. The deprecated
   items will be removed in a future release.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to