Hi Julian,
Plans belonging to the same RelSubset being part of the same equivalent
class, I would expect them to share the same selectivity, as they need to
filter exactly the same fraction of rows, right?

But the problem is almost certainly "numerically unstable" and the order of
filters matter as we are dealing with floating point arithmetic.

If that's correct, and considering that statistics propagation is anyway an
estimation/approximation, it should be reasonable to use the representative
of the equivalence class (possibly via RelSubest::getBestOrOrigin()) for
selectivity estimation.

Does that make sense to you?

Best regards,
Alessandro


On Fri, May 29, 2026, 22:31 Julian Hyde <[email protected]> wrote:

> I don’t recall any reasons.
>
> Some metadata are easy because they are have an ordering. For example, if
> a predicate holds for one rel in a RelSubset then it applies for all.
> Therefore the RelSubset’s RelMdPredicates value should be the union of the
> predicates of all of its constituent rels.
>
> (Algebraically, such metadata have a partial ordering, an have an
> operation to combine values to make one value that is greater than either.
> I think that makes them monoids and a semilattice.)
>
> Selectivity doesn’t have those nice algebraic properties, so maybe we
> didn’t make a decision about “who should win” if there is a disagreement.
>
> Julian
>
>
> > On May 29, 2026, at 2:57 AM, Etienne Pelissier via dev <
> [email protected]> wrote:
> >
> > Me and my team are considering adding a getSelectivity(RelSubset, …)
> > override in our codebase and I'd like to check whether there's a known
> > reason core RelMdSelectivity doesn't do this — i.e. whether we'd be
> walking
> > into something the project has already considered and decided against.
> >
> > I checked https://lists.apache.org
> > <
> https://lists.apache.org/[email protected]:gte=0d:getSelectivity
> >
> > and https://issues.apache.org
> > <
> https://issues.apache.org/jira/browse/CALCITE-3298?jql=project%20%3D%20CALCITE%20AND%20text%20~%20getSelectivity
> >
> > and
> > don't think this subject has already been discussed there.
> >
> > We're planning this override because during Volcano exploration,
> > mq.getSelectivity(subset,
> > p) for a RelSubset falls to the RelNode catch-all in RelMdSelectivity and
> > returns RelMdUtil.guessSelectivity(predicate) — a pure function of the
> > predicate's syntactic shape (per-SqlKind factors multiplied across
> > conjuncts), with no dependency on the underlying RelNode.
> >
> > The override exists in Apache Flink and Apache Drill, which makes its
> > absence in core feel intentional rather than accidental.
> >
> > 1. Is the absence of a RelSubset handler in RelMdSelectivity deliberate?
> > 2. Are there pitfalls in the Flink/Drill-style override that we'd be
> > inheriting? Delegating to subset.getBestOrOriginal() seems like the
> obvious
> > shape, but I want to make sure I'm not missing a known footgun before we
> > ship it.
> > 3. If you've tried this in a Calcite-based engine and hit a problem, I'd
> > love to hear what.
> >
> > Not asking for any changes in core — just trying to sanity-check our
> > downstream decision before we commit to it.
> >
> > Thanks,
> > Etienne Pelissier
>
>

Reply via email to