Hello, everybody.
Trying to resolve an Apache Flink issue I got some troubles with Calcite. Can
you help me to understand is there a problem in Calcite or just in wrong
settings passed to Calcite functions?
I have a simple table "Words" with one column named "word" and a query with NOT
IN operator:
val query = "SELECT word FROM Words WHERE word NOT IN (SELECT word FROM Words
WHERE word = 'hello')"
This query parsed by org.apache.calcite.sql.parser.SqlParser.parseStmt() and
then transformed to a relational tree by
org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(...).
As a result I see the following abstract syntax tree
LogicalProject(word=[$0])
LogicalFilter(condition=[NOT(CASE(=($1, 0), false, IS NOT NULL($5), true, IS
NULL($3), null, <($2, $1), null, false))])
LogicalJoin(condition=[=($3, $4)], joinType=[left])
LogicalProject($f0=[$0], $f1=[$1], $f2=[$2], $f3=[$0])
LogicalJoin(condition=[true], joinType=[inner])
EnumerableTableScan(table=[[Words]])
LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
LogicalProject($f0=[$0], $f1=[true])
LogicalProject(word=[$0])
LogicalFilter(condition=[=($0, 'hello')])
EnumerableTableScan(table=[[Words]])
LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
LogicalProject($f0=[$0], $f1=[true])
LogicalProject(word=[$0])
LogicalFilter(condition=[=($0, 'hello')])
EnumerableTableScan(table=[[Words]])
which fails later during query plan optimization (while calling
org.apache.calcite.tools.Programs.RuleSetProgram.run()).
I think it might be because of a very complex abstract syntax tree generated by
Calcite. Shouldn't it be more simple? This one looks good for me:
LogicalProject(word=[$0])
LogicalFilter(condition=[IS NULL($2)])
LogicalJoin(condition=[=($0, $1)], joinType=[left])
EnumerableTableScan(table=[[Words]])
LogicalProject($f0=[$0], $f1=[true])
LogicalProject(word=[$0])
LogicalFilter(condition=[=($0, 'hello')])
EnumerableTableScan(table=[[Words]])
And when I write a query using LEFT OUTER JOIN to receive this syntax tree -
the optimization works fine. And the query execution result is the same as must
be in case of using NOT IN. So am I wrong with a supposition about bad abstract
syntax tree or not? I will be glad to receive any comments.
Regards,
Alexander