The simplifications you present would also of course assume that the
column has integer type. In any case, overall the proposal seems solid
to me.

--
Michael Mior
[email protected]

Le lun. 10 août 2020 à 04:21, Fan Liya <[email protected]> a écrit :
>
> Hi Julian,
>
> Thanks for opening the discussion.
>
> In general, I am convinced of the value and importance of the proposal.
>
> I want to discuss more about the algebra involved, as the simplified range
> sets may not have the minimum computational costs.
> Some examples:
>
> 1. Suppose we have expression x in (1, 2, 4, 5, 6).
>
> The range set implementation may reduce it to (x >= 1 and x <= 2) or (x >=
> 4 and x <= 6), which represents 4 comparisons and 3 logical operations.
> Another equivalent expression is x = 1 or x = 2 or x = 4 or x = 5 or x = 6,
> which represents 5 comparisons and 4 logical operations.
> It's hard to say which one has a smaller computational cost.
>
> 2. Suppose we have expression x in (1, 2, 4, 5)
> The range set implementation may reduce it to (x >= 1 and x <= 2) or (x >=
> 4 and x <= 5).
> Another (possibly cheaper) reduction should be x >= 1 and x <= 5 and x != 3.
> I am not sure if the range set implementation supports this type of
> simplification.
>
> Therefore, to address above problems,
> 1. We may need a model to estimate the cost, and the simplification process
> should be guided by the cost model.
> 2. Maybe we need to extend the range set implementation so it produces
> expressions with minimum computational cost.
>
> Best,
> Liya Fan
>
>
>
>
>
> On Mon, Aug 10, 2020 at 3:27 AM Julian Hyde <[email protected]> wrote:
>
> > We have had several discussions over the years about how to represent
> > IN-lists (e.g. “x IN (1, 3, 5)”) in RexNode-land.
> >
> > I have generally taken the position that we should expand to ORs (e.g. “x
> > = 1 OR x = 3 OR x = 5”) but a few months ago accepted that we should allow
> > IN in RexCall.
> >
> > I have given this some further thought, as part of a couple of RexSimplify
> > bugs [1] [2] and I now think we should replace IN with something more
> > powerful, namely sargs. A sarg (or search argument [3]) is an ordered set
> > of intervals, and can be represented as a Guava ImmutableRangeSet, such as
> > "[[0‥1), (1‥2], [3‥3], (5‥+∞)]". It can represent an IN-list of constants,
> > but also ranges.
> >
> > Today you would write
> >
> >   RexCall(IN, RexInputRef(0), RexLiteral(1), RexLiteral(3), RexLiteral(5))
> >
> > With sargs, you would instead write
> >
> >   RexCall(IN_SARG, RexInputRef(0), RexSarg(ImmutableRangeSet(“[[1..1],
> > [3..3], [5..5]]”)))
> >
> > There is a new operator IN_SARG, and a new node type RexSarg (it could
> > almost be a RexLiteral).
> >
> > Sargs (and Guava RangeSets) have a powerful and consistent algebra, so if
> > we invest in sarg support in RexSimplify and RelOptPredicateList, that
> > investment is likely to pay dividends in better plans.
> >
> > Guava RangeSets, and hence sargs, have support for discrete domains, so
> > they can easily optimize ">2 AND <4" to "3”.
> >
> > Sargs would be the preferred (therefore canonical) form for any AND or OR
> > list that has more than one comparison on the same operand (e.g. "x > 3 AND
> > x < 17 AND x <> 10”).
> >
> > This proposal would subsume IN and therefore we would stop supporting IN
> > in RexCall.
> >
> > Julian
> >
> > [1] https://issues.apache.org/jira/browse/CALCITE-4155 <
> > https://issues.apache.org/jira/browse/CALCITE-4155>
> >
> > [2] https://github.com/julianhyde/calcite/tree/4159-simplify <
> > https://github.com/julianhyde/calcite/tree/4159-simplify>
> >
> > [3] https://en.wikipedia.org/wiki/Sargable <
> > https://en.wikipedia.org/wiki/Sargable>

Reply via email to