Re: Sanity-check on RelSubset selectivity override

Julian Hyde Mon, 01 Jun 2026 11:37:26 -0700

Yes, it makes sense to use the RelSubset selectivity. I think that’s the main 
issue here, so let’s declare us at consensus.


How that value is arrived at is a different matter. Let’s strive to create 
good, understandable, deterministic, numerically stable formulas for 
statistics. Your statement

> considering that statistics propagation is anyway an
> estimation/approximation

isn't helpful if it lets us shrug and accept nondeterministic statistics 
estimates. A deterministic process for computing metadata is extremely 
desirable, and I believe it is achievable. 

Concretely, if we need to combine several estimates of selectivity, max and min 
are more stable than avg and median. My hunch is that avg-distinct is more 
stable than avg, and might be good enough if we have to combine estimates from 
several sources.

> On May 30, 2026, at 10:47 PM, Alessandro Solimando 
> <[email protected]> wrote:
> 
> Hi Julian,
> Plans belonging to the same RelSubset being part of the same equivalent
> class, I would expect them to share the same selectivity, as they need to
> filter exactly the same fraction of rows, right?
> 
> But the problem is almost certainly "numerically unstable" and the order of
> filters matter as we are dealing with floating point arithmetic.
> 
> If that's correct, and considering that statistics propagation is anyway an
> estimation/approximation, it should be reasonable to use the representative
> of the equivalence class (possibly via RelSubest::getBestOrOrigin()) for
> selectivity estimation.
> 
> Does that make sense to you?
> 
> Best regards,
> Alessandro
> 
> 
> On Fri, May 29, 2026, 22:31 Julian Hyde <[email protected]> wrote:
> 
>> I don’t recall any reasons.
>> 
>> Some metadata are easy because they are have an ordering. For example, if
>> a predicate holds for one rel in a RelSubset then it applies for all.
>> Therefore the RelSubset’s RelMdPredicates value should be the union of the
>> predicates of all of its constituent rels.
>> 
>> (Algebraically, such metadata have a partial ordering, an have an
>> operation to combine values to make one value that is greater than either.
>> I think that makes them monoids and a semilattice.)
>> 
>> Selectivity doesn’t have those nice algebraic properties, so maybe we
>> didn’t make a decision about “who should win” if there is a disagreement.
>> 
>> Julian
>> 
>> 
>>> On May 29, 2026, at 2:57 AM, Etienne Pelissier via dev <
>> [email protected]> wrote:
>>> 
>>> Me and my team are considering adding a getSelectivity(RelSubset, …)
>>> override in our codebase and I'd like to check whether there's a known
>>> reason core RelMdSelectivity doesn't do this — i.e. whether we'd be
>> walking
>>> into something the project has already considered and decided against.
>>> 
>>> I checked https://lists.apache.org
>>> <
>> https://lists.apache.org/[email protected]:gte=0d:getSelectivity
>>> 
>>> and https://issues.apache.org
>>> <
>> https://issues.apache.org/jira/browse/CALCITE-3298?jql=project%20%3D%20CALCITE%20AND%20text%20~%20getSelectivity
>>> 
>>> and
>>> don't think this subject has already been discussed there.
>>> 
>>> We're planning this override because during Volcano exploration,
>>> mq.getSelectivity(subset,
>>> p) for a RelSubset falls to the RelNode catch-all in RelMdSelectivity and
>>> returns RelMdUtil.guessSelectivity(predicate) — a pure function of the
>>> predicate's syntactic shape (per-SqlKind factors multiplied across
>>> conjuncts), with no dependency on the underlying RelNode.
>>> 
>>> The override exists in Apache Flink and Apache Drill, which makes its
>>> absence in core feel intentional rather than accidental.
>>> 
>>> 1. Is the absence of a RelSubset handler in RelMdSelectivity deliberate?
>>> 2. Are there pitfalls in the Flink/Drill-style override that we'd be
>>> inheriting? Delegating to subset.getBestOrOriginal() seems like the
>> obvious
>>> shape, but I want to make sure I'm not missing a known footgun before we
>>> ship it.
>>> 3. If you've tried this in a Calcite-based engine and hit a problem, I'd
>>> love to hear what.
>>> 
>>> Not asking for any changes in core — just trying to sanity-check our
>>> downstream decision before we commit to it.
>>> 
>>> Thanks,
>>> Etienne Pelissier
>> 
>>

Re: Sanity-check on RelSubset selectivity override

Reply via email to