Yes, it makes sense to use the RelSubset selectivity. I think that’s the main issue here, so let’s declare us at consensus.
How that value is arrived at is a different matter. Let’s strive to create good, understandable, deterministic, numerically stable formulas for statistics. Your statement > considering that statistics propagation is anyway an > estimation/approximation isn't helpful if it lets us shrug and accept nondeterministic statistics estimates. A deterministic process for computing metadata is extremely desirable, and I believe it is achievable. Concretely, if we need to combine several estimates of selectivity, max and min are more stable than avg and median. My hunch is that avg-distinct is more stable than avg, and might be good enough if we have to combine estimates from several sources. > On May 30, 2026, at 10:47 PM, Alessandro Solimando > <[email protected]> wrote: > > Hi Julian, > Plans belonging to the same RelSubset being part of the same equivalent > class, I would expect them to share the same selectivity, as they need to > filter exactly the same fraction of rows, right? > > But the problem is almost certainly "numerically unstable" and the order of > filters matter as we are dealing with floating point arithmetic. > > If that's correct, and considering that statistics propagation is anyway an > estimation/approximation, it should be reasonable to use the representative > of the equivalence class (possibly via RelSubest::getBestOrOrigin()) for > selectivity estimation. > > Does that make sense to you? > > Best regards, > Alessandro > > > On Fri, May 29, 2026, 22:31 Julian Hyde <[email protected]> wrote: > >> I don’t recall any reasons. >> >> Some metadata are easy because they are have an ordering. For example, if >> a predicate holds for one rel in a RelSubset then it applies for all. >> Therefore the RelSubset’s RelMdPredicates value should be the union of the >> predicates of all of its constituent rels. >> >> (Algebraically, such metadata have a partial ordering, an have an >> operation to combine values to make one value that is greater than either. >> I think that makes them monoids and a semilattice.) >> >> Selectivity doesn’t have those nice algebraic properties, so maybe we >> didn’t make a decision about “who should win” if there is a disagreement. >> >> Julian >> >> >>> On May 29, 2026, at 2:57 AM, Etienne Pelissier via dev < >> [email protected]> wrote: >>> >>> Me and my team are considering adding a getSelectivity(RelSubset, …) >>> override in our codebase and I'd like to check whether there's a known >>> reason core RelMdSelectivity doesn't do this — i.e. whether we'd be >> walking >>> into something the project has already considered and decided against. >>> >>> I checked https://lists.apache.org >>> < >> https://lists.apache.org/[email protected]:gte=0d:getSelectivity >>> >>> and https://issues.apache.org >>> < >> https://issues.apache.org/jira/browse/CALCITE-3298?jql=project%20%3D%20CALCITE%20AND%20text%20~%20getSelectivity >>> >>> and >>> don't think this subject has already been discussed there. >>> >>> We're planning this override because during Volcano exploration, >>> mq.getSelectivity(subset, >>> p) for a RelSubset falls to the RelNode catch-all in RelMdSelectivity and >>> returns RelMdUtil.guessSelectivity(predicate) — a pure function of the >>> predicate's syntactic shape (per-SqlKind factors multiplied across >>> conjuncts), with no dependency on the underlying RelNode. >>> >>> The override exists in Apache Flink and Apache Drill, which makes its >>> absence in core feel intentional rather than accidental. >>> >>> 1. Is the absence of a RelSubset handler in RelMdSelectivity deliberate? >>> 2. Are there pitfalls in the Flink/Drill-style override that we'd be >>> inheriting? Delegating to subset.getBestOrOriginal() seems like the >> obvious >>> shape, but I want to make sure I'm not missing a known footgun before we >>> ship it. >>> 3. If you've tried this in a Calcite-based engine and hit a problem, I'd >>> love to hear what. >>> >>> Not asking for any changes in core — just trying to sanity-check our >>> downstream decision before we commit to it. >>> >>> Thanks, >>> Etienne Pelissier >> >>
