We have had several discussions over the years about how to represent IN-lists 
(e.g. “x IN (1, 3, 5)”) in RexNode-land.

I have generally taken the position that we should expand to ORs (e.g. “x = 1 
OR x = 3 OR x = 5”) but a few months ago accepted that we should allow IN in 
RexCall.

I have given this some further thought, as part of a couple of RexSimplify bugs 
[1] [2] and I now think we should replace IN with something more powerful, 
namely sargs. A sarg (or search argument [3]) is an ordered set of intervals, 
and can be represented as a Guava ImmutableRangeSet, such as "[[0‥1), (1‥2], 
[3‥3], (5‥+∞)]". It can represent an IN-list of constants, but also ranges.

Today you would write

  RexCall(IN, RexInputRef(0), RexLiteral(1), RexLiteral(3), RexLiteral(5))

With sargs, you would instead write

  RexCall(IN_SARG, RexInputRef(0), RexSarg(ImmutableRangeSet(“[[1..1], [3..3], 
[5..5]]”)))

There is a new operator IN_SARG, and a new node type RexSarg (it could almost 
be a RexLiteral).

Sargs (and Guava RangeSets) have a powerful and consistent algebra, so if we 
invest in sarg support in RexSimplify and RelOptPredicateList, that investment 
is likely to pay dividends in better plans.

Guava RangeSets, and hence sargs, have support for discrete domains, so they 
can easily optimize ">2 AND <4" to "3”.

Sargs would be the preferred (therefore canonical) form for any AND or OR list 
that has more than one comparison on the same operand (e.g. "x > 3 AND x < 17 
AND x <> 10”).
 
This proposal would subsume IN and therefore we would stop supporting IN in 
RexCall.

Julian

[1] https://issues.apache.org/jira/browse/CALCITE-4155 
<https://issues.apache.org/jira/browse/CALCITE-4155>

[2] https://github.com/julianhyde/calcite/tree/4159-simplify 
<https://github.com/julianhyde/calcite/tree/4159-simplify> 

[3] https://en.wikipedia.org/wiki/Sargable 
<https://en.wikipedia.org/wiki/Sargable> 

Reply via email to