Agreed, it should be in reverse order. Translate to semi-join (or anti-join for NOT EXISTS), then optionally use a rule to rewrite semi- or anti-join to Join+Aggregate.
Note that if the EXISTS is in a disjunction (e.g. "delete from orders where exists (select null from order_items where ...) or shipping > 20") we cannot use a semi-join. We have to use a left join, using an indicator column on the right-hand side that will be null iff there is no match. Which is what we do currently. On Wed, Feb 19, 2020 at 10:03 AM Haisheng Yuan <[email protected]> wrote: > > Hi Christian, > > For the query in your example, Calcite first generates inner join plan with > aggregate child, then through SemJoinRule transform the inner join to semi or > antisemi join. The reason to have inner join is that it allows join > commutativity, which is good for generating a potential better plan with > nestedloop join or hash join. > > Admittedly, this process in Calcite is counter intuitive. It should be in > reverse order, first generate a semi or anti-semi join, then generate an > inner/outer join. > > - Haisheng > > ------------------------------------------------------------------ > 发件人:Christian Beikov<[email protected]> > 日 期:2020年02月19日 21:12:13 > 收件人:<[email protected]> > 主 题:Translation of SQL EXISTS > > Hello, > > I'm a bit confused about how the SQL EXISTS predicate is translated. I'd > assume that an EXISTS is translated in relational algebra to a SEMI- and > NOT EXISTS to an ANTI-join, but it's not. > > PlannerImpl p = new PlannerImpl(config); > SqlNode sqlNode = p.parse("delete from _order o where exists (select 1 > from order_position p where o.id = p.order_id)"); > p.validate(sqlNode); > RelRoot rel = p.rel(sqlNode); > RelToSqlConverter sqlConverter = new RelToSqlConverter(dialect); > SqlImplementor.Result result = sqlConverter.visitChild(0, rel.rel); > sqlWriter.format(result.asStatement()); > > Worse, when printing this, I only get DELETE FROM "public"."_order" i.e. > the EXISTS part is not rendered. This is the plan I get. > > LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], > flattened=[true]) > LogicalProject(inputs=[0]) > LogicalProject(inputs=[0], exprs=[[CAST($1):BIGINT, CAST($2):BOOLEAN]]) > LogicalJoin(condition=[=($0, $1)], joinType=[inner]) > JdbcTableScan(table=[[adhoc, _order]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(exprs=[[$1, true]]) > JdbcTableScan(table=[[adhoc, order_position]]) > > I'd expect something along the lines of > > LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], > flattened=[true]) > LogicalProject(inputs=[0]) > LogicalJoin(condition=[=($0, $1)], joinType=[semi]) > JdbcTableScan(table=[[adhoc, _order]]) > JdbcTableScan(table=[[adhoc, order_position]]) > > and for NOT EXISTS > > LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], > flattened=[true]) > LogicalProject(inputs=[0]) > LogicalJoin(condition=[=($0, $1)], joinType=[anti]) > JdbcTableScan(table=[[adhoc, _order]]) > JdbcTableScan(table=[[adhoc, order_position]]) > > Am I missing something and the current aggregate function translation > makes sense? > > I constructed relational algebra structures for some other statements > with SEMI- and ANTI-joins and already noticed that these join types > weren't handled in > org.apache.calcite.rel.rel2sql.RelToSqlConverter#visit(org.apache.calcite.rel.core.Join), > which I fixed locally. Is the lack of a translation intentional? > > Is such a translation of SEMI- and ANTI-join to EXISTS and NOT EXISTS an > over-simplification or would you say it's correct? As far as I > understood from https://en.wikipedia.org/wiki/Relational_algebra this is > correct. > > I'd be happy to contribute that back. I didn't look into the Sql-to-Rel > translation for EXISTS and NOT EXISTS to SEMI- and ANTI-join yet, but I > assume that's not that hard and I could add that. > > Regards, > > Christian > >
