Many of the default selectivity formulas are simple but poor. I totally agree that your approach - treating BETWEEN as one range condition rather than an AND of two independent conditions - is superior. Of course you can override it in your class but it would be better if you could contribute it back to Calcite.
Julian > On Feb 23, 2019, at 10:07 PM, Aman Sinha <[email protected]> wrote: > > Hi devs, > I am trying to estimate the selectivity of BETWEEN predicates using > histograms. Calcite will convert it to a conjunction. > e.g WHERE c1 BETWEEN 10 and 20 ==> WHERE c1 >= 10 AND c1 <= 20 > > The question is : what's the formula for the selectivity of the top level > AND expression ? Since these are not conditions on independent columns, I > don't want to multiply selectivities of individual conjuncts. Ideally, I > want to supply the [low, high] values of the range to my histogram and have > it return the selectivity based on bucket boundaries. > > However, looking at the code in RelMdSelectivity.java and > RelMdUtil.guessSelectivity(), the behavior is to treat each conjunct > independently. I can over-ride the relevant methods in my derived class > and implement the selectivity calculation but I am wondering if there's > some place else in Calcite that deals with such calculation. > > thanks, > Aman
