Hello, I have noticed that our queries get converted into full table scans when they contain IN predicates with long list of elements (eg. id IN ($manyIds) ).
Upon some debuging the reason seems to be SqlToRelConverter.substituteSubQuery() <https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L1001>. Calcite will automatically translate subqueries into joins when DEFAULT_IN_SUB_QUERY_THRESHOLD <https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L221> is reached (currently 20 elements). We have instances when $manyIds can have 10K of elements (out of 50M rows table). Some questions: 1. In order to skip this query translation seems like I have to change SqlToRelConverter.Config.InSubQueryThreshold to MAX_VALUE. How can one do it in CalciteConnection or PreparedStatement ? Is is the only way ? 2. Is SubQueryConverter <https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SubQueryConverter.java> of any use in this particular case ? 3. Can one keep IN predicate "as is" without converting to disjunctions (see convertInToOr <https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L1396>) ? The reason I’m asking is because our elastic queries get pretty big if constructucted using ORs (multiple nested JSONs for each element vs simple JSON array). Example to reproduce (for Mongo Adapter) @Testpublic void subQueryTheshold() { // SqlToRelConverter.DEFAULT_IN_SUB_QUERY_THRESHOLD = 20 // just a long list of ids (>20 to trigger query rewrite) final String ids = IntStream.range(0, SqlToRelConverter.DEFAULT_IN_SUB_QUERY_THRESHOLD) .mapToObj(i -> String.format("'%d'", i)) .collect(Collectors.joining(",")); assertModel(MODEL) .query( "select * from zips where state in (" + ids + ")") .explainContains("MongoFilter") .returns(""); } As usual, thanks a lot for your help. Andrei.
