Hi Christian,

For the query in your example, Calcite first generates inner join plan with 
aggregate child, then through SemJoinRule transform the inner join to semi or 
antisemi join. The reason to have inner join is that it allows join 
commutativity, which is good for generating a potential better plan with 
nestedloop join or hash join.

Admittedly, this process in Calcite is counter intuitive. It should be in 
reverse order, first generate a semi or anti-semi join, then generate an 
inner/outer join.

- Haisheng

------------------------------------------------------------------
发件人:Christian Beikov<[email protected]>
日 期:2020年02月19日 21:12:13
收件人:<[email protected]>
主 题:Translation of SQL EXISTS

Hello,

I'm a bit confused about how the SQL EXISTS predicate is translated. I'd 
assume that an EXISTS is translated in relational algebra to a SEMI- and 
NOT EXISTS to an ANTI-join, but it's not.

PlannerImpl p = new PlannerImpl(config);
SqlNode sqlNode = p.parse("delete from _order o where exists (select 1 
from order_position p where o.id = p.order_id)");
p.validate(sqlNode);
RelRoot rel = p.rel(sqlNode);
RelToSqlConverter sqlConverter = new RelToSqlConverter(dialect);
SqlImplementor.Result result = sqlConverter.visitChild(0, rel.rel);
sqlWriter.format(result.asStatement());

Worse, when printing this, I only get DELETE FROM "public"."_order" i.e. 
the EXISTS part is not rendered. This is the plan I get.

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], 
flattened=[true])
   LogicalProject(inputs=[0])
     LogicalProject(inputs=[0], exprs=[[CAST($1):BIGINT, CAST($2):BOOLEAN]])
       LogicalJoin(condition=[=($0, $1)], joinType=[inner])
         JdbcTableScan(table=[[adhoc, _order]])
         LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
           LogicalProject(exprs=[[$1, true]])
             JdbcTableScan(table=[[adhoc, order_position]])

I'd expect something along the lines of

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], 
flattened=[true])
   LogicalProject(inputs=[0])
       LogicalJoin(condition=[=($0, $1)], joinType=[semi])
         JdbcTableScan(table=[[adhoc, _order]])
         JdbcTableScan(table=[[adhoc, order_position]])

and for NOT EXISTS

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], 
flattened=[true])
   LogicalProject(inputs=[0])
       LogicalJoin(condition=[=($0, $1)], joinType=[anti])
         JdbcTableScan(table=[[adhoc, _order]])
         JdbcTableScan(table=[[adhoc, order_position]])

Am I missing something and the current aggregate function translation 
makes sense?

I constructed relational algebra structures for some other statements 
with SEMI- and ANTI-joins and already noticed that these join types 
weren't handled in 
org.apache.calcite.rel.rel2sql.RelToSqlConverter#visit(org.apache.calcite.rel.core.Join),
 
which I fixed locally. Is the lack of a translation intentional?

Is such a translation of SEMI- and ANTI-join to EXISTS and NOT EXISTS an 
over-simplification or would you say it's correct? As far as I 
understood from https://en.wikipedia.org/wiki/Relational_algebra this is 
correct.

I'd be happy to contribute that back. I didn't look into the Sql-to-Rel 
translation for EXISTS and NOT EXISTS to SEMI- and ANTI-join yet, but I 
assume that's not that hard and I could add that.

Regards,

Christian


Reply via email to