Yes, it would make sense to have this logic reside in Calcite - at least the core part that does not depend on the histogram. I will capture some thoughts on this in a JIRA. Thanks for your input.
Aman On Mon, Feb 25, 2019 at 3:48 PM Julian Hyde <jh...@apache.org> wrote: > Many of the default selectivity formulas are simple but poor. > > I totally agree that your approach - treating BETWEEN as one range > condition rather than an AND of two independent conditions - is superior. > Of course you can override it in your class but it would be better if you > could contribute it back to Calcite. > > Julian > > > > On Feb 23, 2019, at 10:07 PM, Aman Sinha <amansi...@gmail.com> wrote: > > > > Hi devs, > > I am trying to estimate the selectivity of BETWEEN predicates using > > histograms. Calcite will convert it to a conjunction. > > e.g WHERE c1 BETWEEN 10 and 20 ==> WHERE c1 >= 10 AND c1 <= 20 > > > > The question is : what's the formula for the selectivity of the top level > > AND expression ? Since these are not conditions on independent > columns, I > > don't want to multiply selectivities of individual conjuncts. Ideally, > I > > want to supply the [low, high] values of the range to my histogram and > have > > it return the selectivity based on bucket boundaries. > > > > However, looking at the code in RelMdSelectivity.java and > > RelMdUtil.guessSelectivity(), the behavior is to treat each conjunct > > independently. I can over-ride the relevant methods in my derived class > > and implement the selectivity calculation but I am wondering if there's > > some place else in Calcite that deals with such calculation. > > > > thanks, > > Aman > >