Hi Zoltan,

Thanks for your reply.

I also found CALCITE-1321
<https://issues.apache.org/jira/browse/CALCITE-1321> which is related
(defining threshold parameter) but not the same. The discussion there
points the preference for planner rule (instead of SqlToRelConverter). Do
you know how complicated it is ?

Regards

On Thu, Aug 23, 2018 at 9:58 AM Zoltan Haindrich <[email protected]> wrote:

> Hello Andrei!
>
> I've also bumped into this earlier - from a little bit different angle :)
>
> I've found some "back story about it": It works currently like this
> because a long time ago there was no other way to do the subquery
> conversion of IN;
> I think now the preferred way would be to have a rule to do that
> conversion...
>
> About (3) leaving INs as is: I've experimented with that earlier and it
> seems like there are some places where IN is handled only as a subquery.
> If we don't open INs; then there is a chance that some expression
> simplification will not happen (because it's not handled there either).
>
> For Hive we've decided to follow the Calcite way for now; so Hive also
> opens ORs before it calls Calcite - but there's also a rule which tries to
> close ORs into an IN.
>
> https://github.com/apache/hive/blob/236a32c645a21b04ccaf7f18db5c6a5aa53586e8/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HivePointLookupOptimizerRule.java#L217
>
> I plan to get back to this; or at least get CALCITE-2444.
>
> cheers,
> Zoltan
>
>
> On 08/14/2018 07:38 AM, Andrei Sereda wrote:
> > Hello,
> >
> > I have noticed that our queries get converted into full table scans when
> > they contain IN predicates with long list of elements (eg. id IN
> ($manyIds)
> > ).
> >
> > Upon some debuging the reason seems to be
> > SqlToRelConverter.substituteSubQuery()
> > <
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L1001
> >.
> > Calcite will automatically translate subqueries into joins when
> > DEFAULT_IN_SUB_QUERY_THRESHOLD
> > <
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L221
> >
> > is
> > reached (currently 20 elements). We have instances when $manyIds can have
> > 10K of elements (out of 50M rows table).
> >
> > Some questions:
> >
> >     1.
> >
> >     In order to skip this query translation seems like I have to change
> >     SqlToRelConverter.Config.InSubQueryThreshold to MAX_VALUE. How can
> one
> >     do it in CalciteConnection or PreparedStatement ? Is is the only way
> ?
> >     2.
> >
> >     Is SubQueryConverter
> >     <
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SubQueryConverter.java
> >
> >     of any use in this particular case ?
> >     3.
> >
> >     Can one keep IN predicate "as is" without converting to disjunctions
> >     (see convertInToOr
> >     <
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L1396
> >)
> >     ?
> >     The reason I’m asking is because our elastic queries get pretty big
> if
> >     constructucted using ORs (multiple nested JSONs for each element vs
> >     simple JSON array).
> >
> > Example to reproduce (for Mongo Adapter)
> >
> > @Testpublic void subQueryTheshold() {
> >      // SqlToRelConverter.DEFAULT_IN_SUB_QUERY_THRESHOLD = 20
> >      // just a long list of ids (>20 to trigger query rewrite)
> >      final String ids = IntStream.range(0,
> > SqlToRelConverter.DEFAULT_IN_SUB_QUERY_THRESHOLD)
> >          .mapToObj(i -> String.format("'%d'", i))
> >          .collect(Collectors.joining(","));
> >
> >      assertModel(MODEL)
> >          .query(
> >              "select * from zips where state in (" + ids + ")")
> >          .explainContains("MongoFilter")
> >          .returns("");
> > }
> >
> > As usual, thanks a lot for your help.
> >
> > Andrei.
> >
>

Reply via email to