Hi Zoltan, Thanks for your reply.
I also found CALCITE-1321 <https://issues.apache.org/jira/browse/CALCITE-1321> which is related (defining threshold parameter) but not the same. The discussion there points the preference for planner rule (instead of SqlToRelConverter). Do you know how complicated it is ? Regards On Thu, Aug 23, 2018 at 9:58 AM Zoltan Haindrich <[email protected]> wrote: > Hello Andrei! > > I've also bumped into this earlier - from a little bit different angle :) > > I've found some "back story about it": It works currently like this > because a long time ago there was no other way to do the subquery > conversion of IN; > I think now the preferred way would be to have a rule to do that > conversion... > > About (3) leaving INs as is: I've experimented with that earlier and it > seems like there are some places where IN is handled only as a subquery. > If we don't open INs; then there is a chance that some expression > simplification will not happen (because it's not handled there either). > > For Hive we've decided to follow the Calcite way for now; so Hive also > opens ORs before it calls Calcite - but there's also a rule which tries to > close ORs into an IN. > > https://github.com/apache/hive/blob/236a32c645a21b04ccaf7f18db5c6a5aa53586e8/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HivePointLookupOptimizerRule.java#L217 > > I plan to get back to this; or at least get CALCITE-2444. > > cheers, > Zoltan > > > On 08/14/2018 07:38 AM, Andrei Sereda wrote: > > Hello, > > > > I have noticed that our queries get converted into full table scans when > > they contain IN predicates with long list of elements (eg. id IN > ($manyIds) > > ). > > > > Upon some debuging the reason seems to be > > SqlToRelConverter.substituteSubQuery() > > < > https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L1001 > >. > > Calcite will automatically translate subqueries into joins when > > DEFAULT_IN_SUB_QUERY_THRESHOLD > > < > https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L221 > > > > is > > reached (currently 20 elements). We have instances when $manyIds can have > > 10K of elements (out of 50M rows table). > > > > Some questions: > > > > 1. > > > > In order to skip this query translation seems like I have to change > > SqlToRelConverter.Config.InSubQueryThreshold to MAX_VALUE. How can > one > > do it in CalciteConnection or PreparedStatement ? Is is the only way > ? > > 2. > > > > Is SubQueryConverter > > < > https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SubQueryConverter.java > > > > of any use in this particular case ? > > 3. > > > > Can one keep IN predicate "as is" without converting to disjunctions > > (see convertInToOr > > < > https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L1396 > >) > > ? > > The reason I’m asking is because our elastic queries get pretty big > if > > constructucted using ORs (multiple nested JSONs for each element vs > > simple JSON array). > > > > Example to reproduce (for Mongo Adapter) > > > > @Testpublic void subQueryTheshold() { > > // SqlToRelConverter.DEFAULT_IN_SUB_QUERY_THRESHOLD = 20 > > // just a long list of ids (>20 to trigger query rewrite) > > final String ids = IntStream.range(0, > > SqlToRelConverter.DEFAULT_IN_SUB_QUERY_THRESHOLD) > > .mapToObj(i -> String.format("'%d'", i)) > > .collect(Collectors.joining(",")); > > > > assertModel(MODEL) > > .query( > > "select * from zips where state in (" + ids + ")") > > .explainContains("MongoFilter") > > .returns(""); > > } > > > > As usual, thanks a lot for your help. > > > > Andrei. > > >
