[
https://issues.apache.org/jira/browse/CALCITE-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922167#comment-17922167
]
Anton Kovalevsky commented on CALCITE-6804:
-------------------------------------------
Positive scenario.
We know that simple anti-joins work.
{noformat}
SELECT * FROM (
select * from (
select e1.product_id
FROM foodmart.product e1
)
) selected
where not exists (
select 1
from foodmart.product e2 where selected.product_id = e2.product_id
){noformat}
The plan:
{noformat}
LogicalProject(product_id=[$0])
LogicalFilter(condition=[NOT(EXISTS({
LogicalFilter(condition=[=($cor0.product_id, $1)])
JdbcTableScan(table=[[foodmart, product]])
}))], variablesSet=[[$cor0]])
LogicalProject(product_id=[$1])
JdbcTableScan(table=[[foodmart, product]]){noformat}
I added log lines to follow the logic of:
* filling correlation table and
* alias generation
{noformat}
private void parseCorrelTable(RelNode relNode, Result x) {
for (CorrelationId id : relNode.getVariablesSet()) {
System.out.println("===============================================");
System.out.println("Filling[needed=" +x.neededAlias +"] [var=" + id + "] by
[" + x.node.toString() +"]");
System.out.println("");
correlTableMap.put(id, x.qualifiedContext());
}
}{noformat}
{noformat}
org.apache.calcite.rel.rel2sql.SqlImplementor#result(org.apache.calcite.sql.SqlNode,
java.util.Collection<org.apache.calcite.rel.rel2sql.SqlImplementor.Clause>,
org.apache.calcite.rel.RelNode,
java.util.Map<java.lang.String,org.apache.calcite.rel.type.RelDataType>){noformat}
ends with logging:
{noformat}
System.out.println("For node: " + node.toString() + " alias is " + alias5);
return result(node, clauses, alias5, rowType,
ImmutableMap.of(alias4, rowType));{noformat}
Let's see how the normal flow goes:
{noformat}
For node: [foodmart.product] alias is null
For node: [SELECT `product_id` FROM `foodmart`.`product`] alias is t
===============================================
Filling[needed=t] [var=$cor0] by [SELECT `product_id` FROM
`foodmart`.`product`]{noformat}
So, it is quite naively expected. We visited whole our input. Called it
"{*}t{*}" and moved further.
*$cor0* received legitimate assignment and is not being touched.
I also showing rest of aliasing logic.
{noformat}
For node: foodmart.product alias is product0
For node: [SELECT * FROM `foodmart`.`product` WHERE `t`.`product_id` =
`product_id`] alias is t0
For node:
[
SELECT * FROM (SELECT `product_id` FROM `foodmart`.`product`) AS `t`
WHERE NOT EXISTS (
SELECT * FROM `foodmart`.`product`WHERE `t`.`product_id` = `product_id`
)
] alias is t1
For node:
[
SELECT *
FROM (SELECT `product_id`
FROM `foodmart`.`product`) AS `t`
WHERE NOT EXISTS (SELECT *
FROM `foodmart`.`product`
WHERE `t`.`product_id` = `product_id`)
] alias is t2
{noformat}
As you can see, cor0 is settled once and nothing changed.
Now, let me tune `selected` clause to be more complex.
{noformat}
(
select * from (
select e1.product_id
FROM foodmart.product e1
LEFT JOIN
foodmart.product e3 on e1.product_id = e3.product_id
) s where true
) selected {noformat}
SqlToRel step worked in expected way and just added join to the input:
{noformat}
LogicalProject(product_id=[$0])
LogicalFilter(condition=[NOT(EXISTS({
LogicalFilter(condition=[=($cor0.product_id, $1)])
JdbcTableScan(table=[[foodmart, product]])
}))], variablesSet=[[$cor0]])
LogicalProject(product_id=[$1])
LogicalJoin(condition=[=($1, $16)], joinType=[left])
JdbcTableScan(table=[[foodmart, product]])
JdbcTableScan(table=[[foodmart, product]])
{noformat}
And decorrelation of *cor0* goes exactly the same:
{noformat}
For node: foodmart.product alias is null
For node: foodmart.product alias is product0
===============================================
Filling[needed=t] [var=$cor0] by [
SELECT `product`.`product_id`
FROM `foodmart`.`product`
LEFT JOIN `foodmart`.`product` AS `product0`
ON `product`.`product_id` = `product0`.`product_id`
]{noformat}
{noformat}
For node: foodmart.product alias is product1
For node:
[
SELECT *
FROM `foodmart`.`product`
WHERE `product`.`product_class_id` = `product_id`
] alias is t0
For node:
[
SELECT *
FROM (SELECT `product`.`product_id`
FROM `foodmart`.`product`
LEFT JOIN `foodmart`.`product` AS `product0` ON `product`.`product_id` =
`product0`.`product_id`) AS `t`
WHERE NOT EXISTS (SELECT *
FROM `foodmart`.`product`
WHERE `product`.`product_class_id` = `product_id`)
] alias is t1
{noformat}
{noformat}
NOT(EXISTS({
LogicalFilter(condition=[=($cor0.product_id, $1)])
JdbcTableScan(table=[[foodmart, product]])
})){noformat}
is turned into
{noformat}
WHERE NOT EXISTS (SELECT *
FROM `foodmart`.`product`
WHERE `product`.`product_class_id` = `product_id`){noformat}
because result of visiting top-level filter misses actual input:
{noformat}
final Result x = visitInput(e, 0, Clause.WHERE);
{noformat}
x.aliases are *product* and {*}product0{*}, pure table scans.
These lines trigger visit of *RexSubQuery*
{noformat}
builder.setWhere(builder.context.toSql(null, e.getCondition()));
{noformat}
{noformat}
case SCALAR_QUERY:
subQuery = (RexSubQuery) rex;
sqlSubQuery =
implementor().visitRoot(subQuery.rel).asQueryOrValues();
{noformat}
And that subquery has plan
{noformat}
LogicalFilter(condition=[=($cor0.product_id, $1)])
JdbcTableScan(table=[[foodmart, product]]){noformat}
Here something goes wrong and cor0 is "decorrelated" in bad way (no usage of
the former lookup).
> Anti-join with WHERE NOT EXISTS syntax has corrupted condition
> --------------------------------------------------------------
>
> Key: CALCITE-6804
> URL: https://issues.apache.org/jira/browse/CALCITE-6804
> Project: Calcite
> Issue Type: Bug
> Reporter: Anton Kovalevsky
> Priority: Major
>
> Queries like:
> {code:sql}
> SELECT * FROM
> (
> SELECT field1 FROM table1 JOIN table2 ON table1.field1 = table2.field2
> ) selected
> WHERE NOT EXISTS (select 1 from table3 where table3.field3 = selected.field1)
> {code}
> are being converted into
> {code:sql}
> SELECT * FROM
> (
> SELECT field1 FROM table1 JOIN table2 ON table1.field1 = table2.field2
> ) selected
> WHERE NOT EXISTS (select 1 from table3 where table3.field3 =
> table2.<random_field>)
> {code}
> Example I added to RelToSqlConverterTest
> {code:java}
> @Test void testAntiJoinWithWhereNotExists() {
> final String sql = "SELECT * FROM (select * from (select
> e1.\"product_id\"\n"
> + "FROM \"foodmart\".\"product\" e1 LEFT JOIN
> \"foodmart\".\"product\" e3 on e1.\"product_id\" = e3.\"product_id\") s where
> true) selected where not exists\n"
> + "(select 1 from \"foodmart\".\"product\" e2 where e2.\"product_id\"
> = selected.\"product_id\")";
> final String expected = "SELECT *\n" +
> "FROM (SELECT \"product\".\"product_id\"\n" +
> "FROM \"foodmart\".\"product\"\n" +
> "LEFT JOIN \"foodmart\".\"product\" AS \"product0\" ON
> \"product\".\"product_id\" = \"product0\".\"product_id\"" +
> ") AS \"t\"\n" +
> "WHERE EXISTS (SELECT *\nFROM \"foodmart\".\"product\"\nWHERE
> \"product_id\" = \"t\".\"product_class_id\")"
> ;
> sql(sql).ok(expected);
> }
> {code}
> {code:java}
> Expected: is "SELECT ...) AS \"t\"\nWHERE EXISTS (... WHERE \"product_id\" =
> \"t\".\"product_id\")"
> but: was "SELECT ...) AS \"t\"\nWHERE EXISTS (... WHERE \"product_id\" =
> \"product1\".\"product_class_id\")"
> {code}
> *product1* is generated alias for a query from one of sub-queries, and
> *product_class_id* is a field from that misused table scan.
> My high level understanding is that query with WHERE NOT EXISTS syntax is
> considered as LogicalFilter and appropriate pieces of code (like
> *AliasReplacementShuttle* and {*}visitAntiOrSemiJoin{*}) are not invoked.
> And visit of Filter node builds alias context inappropriately.
> Directions I am trying:
> - Duplicate antiJoin visit under the scope of filter visit.
> - Explicit rule to convert Filter to Join manually. Feels artificial because
> it planner should be triggered, it requires a convention.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)