Hello, everybody.

Trying to resolve an Apache Flink issue I got some troubles with Calcite. Can 
you help me to understand is there a problem in Calcite or just in wrong 
settings passed to Calcite functions?

I have a simple table "Words" with one column named "word" and a query with NOT 
IN operator:
val query = "SELECT word FROM Words WHERE word NOT IN (SELECT word FROM Words 
WHERE word = 'hello')"

This query parsed by org.apache.calcite.sql.parser.SqlParser.parseStmt() and 
then transformed to a relational tree by 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(...).

As a result I see the following abstract syntax tree
LogicalProject(word=[$0])
  LogicalFilter(condition=[NOT(CASE(=($1, 0), false, IS NOT NULL($5), true, IS 
NULL($3), null, <($2, $1), null, false))])
    LogicalJoin(condition=[=($3, $4)], joinType=[left])
      LogicalProject($f0=[$0], $f1=[$1], $f2=[$2], $f3=[$0])
        LogicalJoin(condition=[true], joinType=[inner])
          EnumerableTableScan(table=[[Words]])
          LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
            LogicalProject($f0=[$0], $f1=[true])
              LogicalProject(word=[$0])
                LogicalFilter(condition=[=($0, 'hello')])
                  EnumerableTableScan(table=[[Words]])
      LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
        LogicalProject($f0=[$0], $f1=[true])
          LogicalProject(word=[$0])
            LogicalFilter(condition=[=($0, 'hello')])
              EnumerableTableScan(table=[[Words]])

which fails later during query plan optimization (while calling 
org.apache.calcite.tools.Programs.RuleSetProgram.run()).

I think it might be because of a very complex abstract syntax tree generated by 
Calcite. Shouldn't it be more simple? This one looks good for me:
LogicalProject(word=[$0])
  LogicalFilter(condition=[IS NULL($2)])
    LogicalJoin(condition=[=($0, $1)], joinType=[left])
      EnumerableTableScan(table=[[Words]])
      LogicalProject($f0=[$0], $f1=[true])
        LogicalProject(word=[$0])
          LogicalFilter(condition=[=($0, 'hello')])
            EnumerableTableScan(table=[[Words]])

And when I write a query using LEFT OUTER JOIN to receive this syntax tree - 
the optimization works fine. And the query execution result is the same as must 
be in case of using NOT IN. So am I wrong with a supposition about bad abstract 
syntax tree or not? I will be glad to receive any comments.

Regards,
Alexander

Reply via email to