[ 
https://issues.apache.org/jira/browse/CALCITE-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated CALCITE-3571:
-----------------------------------
    Description: 
Table `tblA` schema: 
||field_name||field_type||
|id|VARBINARY|
|fA1|VARCHAR|

 Table `tblB` schema:
||field_name||field_type||
|id|VARCHAR|
|fB1|VARCHAR|

For the following query:
{code:java}
WITH query as (select SELECT id, fA1, fA1 AS fA1_2 FROM tblA)
SELECT fA1, fB1, fA1_2
FROM query
JOIN tblB ON (TO_HEX(query.id) = tblB.id){code}
This example uses TO_HEX as an example, but it can be any other expression; for 
example `cast query.id as Integer`.

Out of bound exception is thrown in SqlToRelConverter. 

 

Before joins are created left and right paths are parsed first. For the 1st 
query above they are as follows:
{code:java}
Left:
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2)
  LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1)

Right:
LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code}
As they are processed - they are registered as leaves (added to the Array).

 

When Join node is being created it knows what the `condition expressions` is:
{code:java}
=(TO_HEX($0), $3)
{code}
Since TO_HEX is not computed anywhere - it modifies the left input to be as 
follows (via RelOptUtil#pushDownJoinConditions) because 
RelBuilder#shouldMergeProject always return true. 
{code:java}
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR $f3)
{code}
where `VARCHAR $f3` is a result of TO_HEX. Note that the list of leaves is not 
updated.

[https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L2571]

 

Finally, when identifier "query.fA1_2" is being converted (via 
SqlToRelConverter#convertIdentifier) for the top-most node
{code:java}
top-most node:
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR id0, VARCHAR fB1)
  LogicalJoin with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR 
$f3, VARCHAR id0, VARCHAR fB1)
    LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR $f3)
      LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1)
    LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code}
Blackboard perform a lookup (via SqlToRelConverter#lookupExp), in process of 
which LookupContext is created.

In a constructor, LookupContext performs flatten, which recursively traverses  
tree of nodes (from above codeblock) and checks the leaves to see if they 
contain such expression. When it does get to the modified left input of a join 
it does not get a match on it and continues further down to a TableScan.

When it finally flattens the result, TableScan's RecordType knows nothing about 
a duplicated field `fA1_2`, causing an error above.

 

I think a viable solution would be to modify Join creation to register a 
resulting join inputs as leaves (when they are modified). Alternative approach 
would be to not merge Projects when join needs to modify an input.

  was:
Table `tblA` schema: 
||field_name||field_type||
|id|VARBINARY|
|fA1|VARCHAR|

 Table `tblB` schema:
||field_name||field_type||
|id|VARBINARY|
|fA1|VARCHAR|

For the following query:
{code:java}
WITH query as (select SELECT id, fA1, fA1 AS fA1_2 FROM tblA)
SELECT fA1, fB1, fA1_2
FROM query
JOIN tblB ON (TO_HEX(query.id) = tblB.id){code}
This example uses TO_HEX as an example, but it can be any other expression; for 
example `cast query.id as Integer`.

Out of bound exception is thrown in SqlToRelConverter. 

 

Before joins are created left and right paths are parsed first. For the 1st 
query above they are as follows:
{code:java}
Left:
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2)
  LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1)

Right:
LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code}
As they are processed - they are registered as leaves (added to the Array).

 

When Join node is being created it knows what the `condition expressions` is:
{code:java}
=(TO_HEX($0), $3)
{code}
Since TO_HEX is not computed anywhere - it modifies the left input to be as 
follows (via RelOptUtil#pushDownJoinConditions) because 
RelBuilder#shouldMergeProject always return true. 
{code:java}
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR $f3)
{code}
where `VARCHAR $f3` is a result of TO_HEX. Note that the list of leaves is not 
updated.

[https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L2571]

 

Finally, when identifier "query.fA1_2" is being converted (via 
SqlToRelConverter#convertIdentifier) for the top-most node
{code:java}
top-most node:
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR id0, VARCHAR fB1)
  LogicalJoin with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR 
$f3, VARCHAR id0, VARCHAR fB1)
    LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR $f3)
      LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1)
    LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code}
Blackboard perform a lookup (via SqlToRelConverter#lookupExp), in process of 
which LookupContext is created.

In a constructor, LookupContext performs flatten, which recursively traverses  
tree of nodes (from above codeblock) and checks the leaves to see if they 
contain such expression. When it does get to the modified left input of a join 
it does not get a match on it and continues further down to a TableScan.

When it finally flattens the result, TableScan's RecordType knows nothing about 
a duplicated field `fA1_2`, causing an error above.

 

I think a viable solution would be to modify Join creation to register a 
resulting join inputs as leaves (when they are modified). Alternative approach 
would be to not merge Projects when join needs to modify an input.


> RelBuilder#shouldMergeProject throws an exception for JOIN with complex 
> conditions
> ----------------------------------------------------------------------------------
>
>                 Key: CALCITE-3571
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3571
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.21.0
>            Reporter: Kirill Kozlov
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Table `tblA` schema: 
> ||field_name||field_type||
> |id|VARBINARY|
> |fA1|VARCHAR|
>  Table `tblB` schema:
> ||field_name||field_type||
> |id|VARCHAR|
> |fB1|VARCHAR|
> For the following query:
> {code:java}
> WITH query as (select SELECT id, fA1, fA1 AS fA1_2 FROM tblA)
> SELECT fA1, fB1, fA1_2
> FROM query
> JOIN tblB ON (TO_HEX(query.id) = tblB.id){code}
> This example uses TO_HEX as an example, but it can be any other expression; 
> for example `cast query.id as Integer`.
> Out of bound exception is thrown in SqlToRelConverter. 
>  
> Before joins are created left and right paths are parsed first. For the 1st 
> query above they are as follows:
> {code:java}
> Left:
> LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2)
>   LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1)
> Right:
> LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code}
> As they are processed - they are registered as leaves (added to the Array).
>  
> When Join node is being created it knows what the `condition expressions` is:
> {code:java}
> =(TO_HEX($0), $3)
> {code}
> Since TO_HEX is not computed anywhere - it modifies the left input to be as 
> follows (via RelOptUtil#pushDownJoinConditions) because 
> RelBuilder#shouldMergeProject always return true. 
> {code:java}
> LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
> VARCHAR $f3)
> {code}
> where `VARCHAR $f3` is a result of TO_HEX. Note that the list of leaves is 
> not updated.
> [https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L2571]
>  
> Finally, when identifier "query.fA1_2" is being converted (via 
> SqlToRelConverter#convertIdentifier) for the top-most node
> {code:java}
> top-most node:
> LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
> VARCHAR id0, VARCHAR fB1)
>   LogicalJoin with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
> VARCHAR $f3, VARCHAR id0, VARCHAR fB1)
>     LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
> VARCHAR $f3)
>       LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1)
>     LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code}
> Blackboard perform a lookup (via SqlToRelConverter#lookupExp), in process of 
> which LookupContext is created.
> In a constructor, LookupContext performs flatten, which recursively traverses 
>  tree of nodes (from above codeblock) and checks the leaves to see if they 
> contain such expression. When it does get to the modified left input of a 
> join it does not get a match on it and continues further down to a TableScan.
> When it finally flattens the result, TableScan's RecordType knows nothing 
> about a duplicated field `fA1_2`, causing an error above.
>  
> I think a viable solution would be to modify Join creation to register a 
> resulting join inputs as leaves (when they are modified). Alternative 
> approach would be to not merge Projects when join needs to modify an input.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to