Hi Julian, During my work on subquery support in Hive I am running into similar issue with almost same query where the CASE statement [NOT(CASE(=($1, 0), false, IS NOT NULL($5), true, IS NULL($3), null, <($2, $1), null, false))] is generated with a null constant expression but with NULL type. Is that a valid thing for calcite to do ? Or should it be generating a typed null constant ?
I noticed most of the time case statement is simplified and typed (within Subquery remove rule). But there are statements which are left un-simplified and with untyped expression. I wonder if this is a bug. Thanks, Vineet On 10/31/16, 12:42 PM, "Julian Hyde" <[email protected]> wrote: >The behavior of NOT IN in SQL is complicated when there are NULL >values around. In particular, if one "word" value from the sub-query >is null, then the outer query must return 0 rows. (Why? Because "word >NOT IN ('foo', 'bar' null)" would evaluate to UNKNOWN for every row.) > >It is valid to deduce that "word" in the sub-query is never null, >because of the "WHERE word = 'hello'" condition. I would have hoped >that a constant reduction could do that, and then maybe the CASE >expression can be simplified. > >By the way, to be pedantic, what we are talking about here is the >RelNode tree, the relational algebra, which comes out of the >SqlToRelConverter. The AST is the SqlNode tree, which comes out of the >parser and goes into the SqlToRelConverter. > >On Mon, Oct 31, 2016 at 8:46 AM, Alexander Shoshin ><[email protected]> wrote: >> Hello, everybody. >> >> Trying to resolve an Apache Flink issue I got some troubles with Calcite. >> Can you help me to understand is there a problem in Calcite or just in wrong >> settings passed to Calcite functions? >> >> I have a simple table "Words" with one column named "word" and a query with >> NOT IN operator: >> val query = "SELECT word FROM Words WHERE word NOT IN (SELECT word FROM >> Words WHERE word = 'hello')" >> >> This query parsed by org.apache.calcite.sql.parser.SqlParser.parseStmt() and >> then transformed to a relational tree by >> org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(...). >> >> As a result I see the following abstract syntax tree >> LogicalProject(word=[$0]) >> LogicalFilter(condition=[NOT(CASE(=($1, 0), false, IS NOT NULL($5), true, >> IS NULL($3), null, <($2, $1), null, false))]) >> LogicalJoin(condition=[=($3, $4)], joinType=[left]) >> LogicalProject($f0=[$0], $f1=[$1], $f2=[$2], $f3=[$0]) >> LogicalJoin(condition=[true], joinType=[inner]) >> EnumerableTableScan(table=[[Words]]) >> LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)]) >> LogicalProject($f0=[$0], $f1=[true]) >> LogicalProject(word=[$0]) >> LogicalFilter(condition=[=($0, 'hello')]) >> EnumerableTableScan(table=[[Words]]) >> LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) >> LogicalProject($f0=[$0], $f1=[true]) >> LogicalProject(word=[$0]) >> LogicalFilter(condition=[=($0, 'hello')]) >> EnumerableTableScan(table=[[Words]]) >> >> which fails later during query plan optimization (while calling >> org.apache.calcite.tools.Programs.RuleSetProgram.run()). >> >> I think it might be because of a very complex abstract syntax tree generated >> by Calcite. Shouldn't it be more simple? This one looks good for me: >> LogicalProject(word=[$0]) >> LogicalFilter(condition=[IS NULL($2)]) >> LogicalJoin(condition=[=($0, $1)], joinType=[left]) >> EnumerableTableScan(table=[[Words]]) >> LogicalProject($f0=[$0], $f1=[true]) >> LogicalProject(word=[$0]) >> LogicalFilter(condition=[=($0, 'hello')]) >> EnumerableTableScan(table=[[Words]]) >> >> And when I write a query using LEFT OUTER JOIN to receive this syntax tree - >> the optimization works fine. And the query execution result is the same as >> must be in case of using NOT IN. So am I wrong with a supposition about bad >> abstract syntax tree or not? I will be glad to receive any comments. >> >> Regards, >> Alexander >
