The simplifications you present would also of course assume that the column has integer type. In any case, overall the proposal seems solid to me.
-- Michael Mior [email protected] Le lun. 10 août 2020 à 04:21, Fan Liya <[email protected]> a écrit : > > Hi Julian, > > Thanks for opening the discussion. > > In general, I am convinced of the value and importance of the proposal. > > I want to discuss more about the algebra involved, as the simplified range > sets may not have the minimum computational costs. > Some examples: > > 1. Suppose we have expression x in (1, 2, 4, 5, 6). > > The range set implementation may reduce it to (x >= 1 and x <= 2) or (x >= > 4 and x <= 6), which represents 4 comparisons and 3 logical operations. > Another equivalent expression is x = 1 or x = 2 or x = 4 or x = 5 or x = 6, > which represents 5 comparisons and 4 logical operations. > It's hard to say which one has a smaller computational cost. > > 2. Suppose we have expression x in (1, 2, 4, 5) > The range set implementation may reduce it to (x >= 1 and x <= 2) or (x >= > 4 and x <= 5). > Another (possibly cheaper) reduction should be x >= 1 and x <= 5 and x != 3. > I am not sure if the range set implementation supports this type of > simplification. > > Therefore, to address above problems, > 1. We may need a model to estimate the cost, and the simplification process > should be guided by the cost model. > 2. Maybe we need to extend the range set implementation so it produces > expressions with minimum computational cost. > > Best, > Liya Fan > > > > > > On Mon, Aug 10, 2020 at 3:27 AM Julian Hyde <[email protected]> wrote: > > > We have had several discussions over the years about how to represent > > IN-lists (e.g. “x IN (1, 3, 5)”) in RexNode-land. > > > > I have generally taken the position that we should expand to ORs (e.g. “x > > = 1 OR x = 3 OR x = 5”) but a few months ago accepted that we should allow > > IN in RexCall. > > > > I have given this some further thought, as part of a couple of RexSimplify > > bugs [1] [2] and I now think we should replace IN with something more > > powerful, namely sargs. A sarg (or search argument [3]) is an ordered set > > of intervals, and can be represented as a Guava ImmutableRangeSet, such as > > "[[0‥1), (1‥2], [3‥3], (5‥+∞)]". It can represent an IN-list of constants, > > but also ranges. > > > > Today you would write > > > > RexCall(IN, RexInputRef(0), RexLiteral(1), RexLiteral(3), RexLiteral(5)) > > > > With sargs, you would instead write > > > > RexCall(IN_SARG, RexInputRef(0), RexSarg(ImmutableRangeSet(“[[1..1], > > [3..3], [5..5]]”))) > > > > There is a new operator IN_SARG, and a new node type RexSarg (it could > > almost be a RexLiteral). > > > > Sargs (and Guava RangeSets) have a powerful and consistent algebra, so if > > we invest in sarg support in RexSimplify and RelOptPredicateList, that > > investment is likely to pay dividends in better plans. > > > > Guava RangeSets, and hence sargs, have support for discrete domains, so > > they can easily optimize ">2 AND <4" to "3”. > > > > Sargs would be the preferred (therefore canonical) form for any AND or OR > > list that has more than one comparison on the same operand (e.g. "x > 3 AND > > x < 17 AND x <> 10”). > > > > This proposal would subsume IN and therefore we would stop supporting IN > > in RexCall. > > > > Julian > > > > [1] https://issues.apache.org/jira/browse/CALCITE-4155 < > > https://issues.apache.org/jira/browse/CALCITE-4155> > > > > [2] https://github.com/julianhyde/calcite/tree/4159-simplify < > > https://github.com/julianhyde/calcite/tree/4159-simplify> > > > > [3] https://en.wikipedia.org/wiki/Sargable < > > https://en.wikipedia.org/wiki/Sargable>
