Hey Haisheng,
it is nice to have a rule that detects such patterns but do you agree
that it would be better to generate SEMI- and ANTI-joins already during
Sql-to-Rel conversion?
I implemented the Rel-to-Sql conversion for SEMI- and ANTI-join and it
seems to work for my use cases, but it feels a bit ugly as I have to do
some transformations which might be ad-hoc. I had to replace identifiers
that refer to the outcome of the SEMI- and ANTI-joins with the select
items of the LHS because the SEMI- and ANTI-join does not get an alias.
See for yourself:
https://github.com/beikov/calcite/commit/6ca67e2e8df44c2081e71407ee6893a961f36271
I'll add a few tests and a comment, but could anyone please comment on
whether this approach looks ok?
Regards,
Christian
Am 19.02.2020 um 19:03 schrieb Haisheng Yuan:
Hi Christian,
For the query in your example, Calcite first generates inner join plan
with aggregate child, then through SemJoinRule transform the inner
join to semi or antisemi join. The reason to have inner join is that
it allows join commutativity, which is good for generating a potential
better plan with nestedloop join or hash join.
Admittedly, this process in Calcite is counter intuitive. It should be
in reverse order, first generate a semi or anti-semi join, then
generate an inner/outer join.
- Haisheng
------------------------------------------------------------------
发件人:Christian Beikov<[email protected]>
日 期:2020年02月19日 21:12:13
收件人:<[email protected]>
主 题:Translation of SQL EXISTS
Hello,
I'm a bit confused about how the SQL EXISTS predicate is translated. I'd
assume that an EXISTS is translated in relational algebra to a SEMI- and
NOT EXISTS to an ANTI-join, but it's not.
PlannerImpl p = new PlannerImpl(config);
SqlNode sqlNode = p.parse("delete from _order o where exists (select 1
from order_position p where o.id = p.order_id)");
p.validate(sqlNode);
RelRoot rel = p.rel(sqlNode);
RelToSqlConverter sqlConverter = new RelToSqlConverter(dialect);
SqlImplementor.Result result = sqlConverter.visitChild(0, rel.rel);
sqlWriter.format(result.asStatement());
Worse, when printing this, I only get DELETE FROM "public"."_order" i.e.
the EXISTS part is not rendered. This is the plan I get.
LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
flattened=[true])
LogicalProject(inputs=[0])
LogicalProject(inputs=[0], exprs=[[CAST($1):BIGINT, CAST($2):BOOLEAN]])
LogicalJoin(condition=[=($0, $1)], joinType=[inner])
JdbcTableScan(table=[[adhoc, _order]])
LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
LogicalProject(exprs=[[$1, true]])
JdbcTableScan(table=[[adhoc, order_position]])
I'd expect something along the lines of
LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
flattened=[true])
LogicalProject(inputs=[0])
LogicalJoin(condition=[=($0, $1)], joinType=[semi])
JdbcTableScan(table=[[adhoc, _order]])
JdbcTableScan(table=[[adhoc, order_position]])
and for NOT EXISTS
LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
flattened=[true])
LogicalProject(inputs=[0])
LogicalJoin(condition=[=($0, $1)], joinType=[anti])
JdbcTableScan(table=[[adhoc, _order]])
JdbcTableScan(table=[[adhoc, order_position]])
Am I missing something and the current aggregate function translation
makes sense?
I constructed relational algebra structures for some other statements
with SEMI- and ANTI-joins and already noticed that these join types
weren't handled in
org.apache.calcite.rel.rel2sql.RelToSqlConverter#visit(org.apache.calcite.rel.core.Join),
which I fixed locally. Is the lack of a translation intentional?
Is such a translation of SEMI- and ANTI-join to EXISTS and NOT EXISTS an
over-simplification or would you say it's correct? As far as I
understood from https://en.wikipedia.org/wiki/Relational_algebra this is
correct.
I'd be happy to contribute that back. I didn't look into the Sql-to-Rel
translation for EXISTS and NOT EXISTS to SEMI- and ANTI-join yet, but I
assume that's not that hard and I could add that.
Regards,
Christian