I wanted to reply and share our recent requirement for handling SQL like
the following `error_code IN (3002, 3030)' and the challenges we
faced. For our implementation on top of Apache Kudu, each disjunction
creates a `Scanner' – a resource we need to limit as it represents a
denial of service attack vector (e.g. too many scanners, heap fills
up). Good news for us is Kudu ships with an [`inListPredicate'] and we
expected a plan to include the `SqlKind.IN' as the function which we
could translate into `inListPredicate'. We were surprised when it didn't
do that. We did eventually make this work for our customers with a hack
below but it is not valid plan – for instance unparsing the plan
produces invalid SQL query – and therefore is brittle (but *works*
:fingers-crossed:) .

┌────
│ // This is not the correct use of Array.
│ final RelDataType listType =
builder.getTypeFactory().createArrayType(fieldType, -1);
│ return builder.call(SqlStdOperatorTable.IN,
│          builder.field(conditionTableName, columnName),
│          rexBuilder.makeLiteral(resultValue, listType, true));
└────
We filed a ticket to do it the correct way, which is to take all the
disjunctions, and "un-parse" them into `inListPredicate' calls *if
possible*. This struck us as pretty dense code *but* would apply to
other disjunctions.

It would be *great* if Calcite shipped with a `RexCall' that our
implementation could translate with little effort into a
`inListPredicate'.


[`inListPredicate']
https://kudu.apache.org/apidocs/org/apache/kudu/client/KuduPredicate.html#newInListPredicate-org.apache.kudu.ColumnSchema-java.util.List-

On Mon, Jul 20, 2020 at 3:09 PM Stamatis Zampetakis <[email protected]>
wrote:

> Another quick thought as far as it concerns the IN operator would be to use
> RexCall as it is right now where the first operand in the list is a
> RexInputRef for instance and the rest are the literals.
> I assume that taking this direction would need to change a bit the
> respective SqlOperator.
>
> I haven't thought of this thoroughly so maybe there are important things
> that I am missing.
>
> Best,
> Stamatis
>
>
> On Tue, Jul 21, 2020 at 12:41 AM Julian Hyde <[email protected]> wrote:
>
> > The name isn't very intuitive.
> >
> > The concept of a list and a comparison operator seems OK. As Vladimir
> > points out, it is somewhat similar to RexSubQuery, so maybe this could
> > be a sub-class (but organizing the data a bit more efficiently).
> >
> > I would be very wary of null semantics. RexNode scalar operators are
> > forced to do 3-valued logic, but this is almost a relational operator
> > and it would be better without that burden.
> >
> > Julian
> >
> >
> >
> > On Mon, Jul 20, 2020 at 3:45 AM Vladimir Sitnikov
> > <[email protected]> wrote:
> > >
> > > >Do you know what is the impact on Enumerable implementation?
> > >
> > > I guess there are plenty of options there.
> > >
> > > The key question regarding RexListCmp is as we introduce a new Rex
> node,
> > > all the planning rules and all engines
> > > must support it somehow.
> > >
> > > Technically speaking, we have RexSubQuery.
> > > Haisheng, have you considered an option to stick with RexSubQuery to
> > avoid
> > > having two more-or-less the same rex classes?
> > >
> > > Vladimir
> >
>

Reply via email to