[ 
https://issues.apache.org/jira/browse/CALCITE-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922167#comment-17922167
 ] 

Anton Kovalevsky commented on CALCITE-6804:
-------------------------------------------

 

Positive scenario.

We know that simple anti-joins work.

 

 
{noformat}
SELECT * FROM (
  select * from (
    select e1.product_id
    FROM foodmart.product e1
  )
) selected 
where not exists (
  select 1 
  from foodmart.product e2 where selected.product_id = e2.product_id
){noformat}
The plan:

 

 
{noformat}
LogicalProject(product_id=[$0])
  LogicalFilter(condition=[NOT(EXISTS({
LogicalFilter(condition=[=($cor0.product_id, $1)])
  JdbcTableScan(table=[[foodmart, product]])
}))], variablesSet=[[$cor0]])
    LogicalProject(product_id=[$1])
      JdbcTableScan(table=[[foodmart, product]]){noformat}
 

I added log lines to follow the logic of:
 * filling correlation table and
 * alias generation

{noformat}
private void parseCorrelTable(RelNode relNode, Result x) {
  for (CorrelationId id : relNode.getVariablesSet()) {
    System.out.println("===============================================");
    System.out.println("Filling[needed=" +x.neededAlias +"] [var=" + id + "] by 
[" + x.node.toString() +"]");
    System.out.println("");

    correlTableMap.put(id, x.qualifiedContext());
  }
}{noformat}
 
{noformat}
org.apache.calcite.rel.rel2sql.SqlImplementor#result(org.apache.calcite.sql.SqlNode,
 java.util.Collection<org.apache.calcite.rel.rel2sql.SqlImplementor.Clause>, 
org.apache.calcite.rel.RelNode, 
java.util.Map<java.lang.String,org.apache.calcite.rel.type.RelDataType>){noformat}
ends with logging:
{noformat}
System.out.println("For node: " + node.toString() + " alias is " + alias5);
return result(node, clauses, alias5, rowType,
    ImmutableMap.of(alias4, rowType));{noformat}
 

 

Let's see how the normal flow goes:

 

 

 
{noformat}
For node: [foodmart.product] alias is null

For node: [SELECT `product_id` FROM `foodmart`.`product`] alias is t

===============================================
Filling[needed=t] [var=$cor0] by [SELECT `product_id` FROM 
`foodmart`.`product`]{noformat}
So, it is quite naively expected. We visited whole our input. Called it 
"{*}t{*}" and moved further.

 

*$cor0* received legitimate assignment and is not being touched.

I also showing rest of aliasing logic.

 
{noformat}
For node: foodmart.product alias is product0

For node: [SELECT * FROM `foodmart`.`product` WHERE `t`.`product_id` = 
`product_id`] alias is t0

For node: 
 [
   SELECT * FROM (SELECT `product_id` FROM `foodmart`.`product`) AS `t`
   WHERE NOT EXISTS (
     SELECT * FROM `foodmart`.`product`WHERE `t`.`product_id` = `product_id`
   )
 ] alias is t1

For node: 
 [
   SELECT *
   FROM (SELECT `product_id`
   FROM `foodmart`.`product`) AS `t`
   WHERE NOT EXISTS (SELECT *
   FROM `foodmart`.`product`
   WHERE `t`.`product_id` = `product_id`)
 ] alias is t2
{noformat}
As you can see, cor0 is settled once and nothing changed.

 

Now, let me tune `selected` clause to be more complex.

 
{noformat}
(
  select * from (
    select e1.product_id
    FROM foodmart.product e1 
    LEFT JOIN 
      foodmart.product e3 on e1.product_id = e3.product_id
  ) s where true
) selected {noformat}
SqlToRel step worked in expected way and just added join to the input:

 

 
{noformat}
LogicalProject(product_id=[$0])
  LogicalFilter(condition=[NOT(EXISTS({
LogicalFilter(condition=[=($cor0.product_id, $1)])
  JdbcTableScan(table=[[foodmart, product]])
}))], variablesSet=[[$cor0]])
    LogicalProject(product_id=[$1])
      LogicalJoin(condition=[=($1, $16)], joinType=[left])
        JdbcTableScan(table=[[foodmart, product]])
        JdbcTableScan(table=[[foodmart, product]])
{noformat}

And decorrelation of *cor0* goes exactly the same:

 

 
{noformat}
For node: foodmart.product alias is null

For node: foodmart.product alias is product0

===============================================
Filling[needed=t] [var=$cor0] by [
  SELECT `product`.`product_id` 
  FROM `foodmart`.`product`
  LEFT JOIN `foodmart`.`product` AS `product0` 
  ON `product`.`product_id` = `product0`.`product_id`
]{noformat}
 

 
{noformat}
For node: foodmart.product alias is product1

For node: 
 [
   SELECT *
   FROM `foodmart`.`product`
   WHERE `product`.`product_class_id` = `product_id`
 ] alias is t0

For node: 
 [
   SELECT *
   FROM (SELECT `product`.`product_id`
   FROM `foodmart`.`product`
   LEFT JOIN `foodmart`.`product` AS `product0` ON `product`.`product_id` = 
`product0`.`product_id`) AS `t`
   WHERE NOT EXISTS (SELECT *
   FROM `foodmart`.`product`
   WHERE `product`.`product_class_id` = `product_id`)
 ] alias is t1
{noformat}
 

 

 

 
{noformat}
NOT(EXISTS({
LogicalFilter(condition=[=($cor0.product_id, $1)])
  JdbcTableScan(table=[[foodmart, product]])
})){noformat}
is turned into 
{noformat}
WHERE NOT EXISTS (SELECT *
FROM `foodmart`.`product`
WHERE `product`.`product_class_id` = `product_id`){noformat}
because result of visiting top-level filter misses actual input:
{noformat}
final Result x = visitInput(e, 0, Clause.WHERE);
{noformat}
x.aliases are *product* and {*}product0{*}, pure table scans.

These lines trigger visit of *RexSubQuery*
{noformat}
builder.setWhere(builder.context.toSql(null, e.getCondition()));
{noformat}
{noformat}
case SCALAR_QUERY:
  subQuery = (RexSubQuery) rex;
  sqlSubQuery =
      implementor().visitRoot(subQuery.rel).asQueryOrValues();
{noformat}
And that subquery has plan
{noformat}
LogicalFilter(condition=[=($cor0.product_id, $1)])
  JdbcTableScan(table=[[foodmart, product]]){noformat}
Here something goes wrong and cor0 is "decorrelated" in bad way (no usage of 
the former lookup).

 

 

 

 

> Anti-join with WHERE NOT EXISTS syntax has corrupted condition
> --------------------------------------------------------------
>
>                 Key: CALCITE-6804
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6804
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Anton Kovalevsky
>            Priority: Major
>
> Queries like:
> {code:sql}
> SELECT * FROM
> (
>   SELECT field1 FROM table1 JOIN table2 ON table1.field1 = table2.field2
> ) selected
> WHERE NOT EXISTS (select 1 from table3 where table3.field3 = selected.field1)
> {code}
> are being converted into 
> {code:sql}
> SELECT * FROM
> (
>   SELECT field1 FROM table1 JOIN table2 ON table1.field1 = table2.field2
> ) selected
> WHERE NOT EXISTS (select 1 from table3 where table3.field3 = 
> table2.<random_field>)
> {code}
> Example I added to RelToSqlConverterTest
> {code:java}
>   @Test void testAntiJoinWithWhereNotExists() {
>     final String sql = "SELECT * FROM (select * from (select 
> e1.\"product_id\"\n"
>         + "FROM \"foodmart\".\"product\" e1 LEFT JOIN 
> \"foodmart\".\"product\" e3 on e1.\"product_id\" = e3.\"product_id\") s where 
> true) selected where not exists\n"
>         + "(select 1 from \"foodmart\".\"product\" e2 where e2.\"product_id\" 
> = selected.\"product_id\")";
>     final String expected = "SELECT *\n" +
>         "FROM (SELECT \"product\".\"product_id\"\n" +
>         "FROM \"foodmart\".\"product\"\n" +
>         "LEFT JOIN \"foodmart\".\"product\" AS \"product0\" ON 
> \"product\".\"product_id\" = \"product0\".\"product_id\"" +
>         ") AS \"t\"\n" +
>         "WHERE EXISTS (SELECT *\nFROM \"foodmart\".\"product\"\nWHERE 
> \"product_id\" = \"t\".\"product_class_id\")"
>         ;
>     sql(sql).ok(expected);
>   }
> {code}
> {code:java}
> Expected: is "SELECT ...) AS \"t\"\nWHERE EXISTS (... WHERE \"product_id\" = 
> \"t\".\"product_id\")"
>     but: was "SELECT ...) AS \"t\"\nWHERE EXISTS (... WHERE \"product_id\" = 
> \"product1\".\"product_class_id\")"
> {code}
> *product1* is generated alias for a query from one of sub-queries, and 
> *product_class_id* is a field from that misused table scan.
> My high level understanding is that query with WHERE NOT EXISTS syntax is 
> considered as LogicalFilter and appropriate pieces of code (like 
> *AliasReplacementShuttle* and {*}visitAntiOrSemiJoin{*}) are not invoked.
> And visit of Filter node builds alias context inappropriately.
> Directions I am trying:
>  - Duplicate antiJoin visit under the scope of filter visit.
>  - Explicit rule to convert Filter to Join manually. Feels artificial because 
> it planner should be triggered, it requires a convention.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to