[ 
https://issues.apache.org/jira/browse/CALCITE-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003804#comment-17003804
 ] 

Kirill Kozlov commented on CALCITE-3571:
----------------------------------------

It appears that the fix for https://issues.apache.org/jira/browse/CALCITE-3575 
also resolves this issue as well.

> RelBuilder#shouldMergeProject throws an exception for JOIN with complex 
> conditions
> ----------------------------------------------------------------------------------
>
>                 Key: CALCITE-3571
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3571
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.21.0
>            Reporter: Kirill Kozlov
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Table `tblA` schema: 
> ||field_name||field_type||
> |id|VARBINARY|
> |fA1|VARCHAR|
>  Table `tblB` schema:
> ||field_name||field_type||
> |id|VARCHAR|
> |fB1|VARCHAR|
> For the following query:
> {code:java}
> WITH query as (select SELECT id, fA1, fA1 AS fA1_2 FROM tblA)
> SELECT fA1, fB1, fA1_2
> FROM query
> JOIN tblB ON (TO_HEX(query.id) = tblB.id){code}
> TO_HEX is used as an example, but it can be any other expression; for 
> instance `cast(query.id as Integer)`.
> Out of bound exception is thrown in SqlToRelConverter. 
>  
> Before joins are created left and right paths are parsed first. For the 1st 
> query above they are as follows:
> {code:java}
> Left:
> LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2)
>   LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1)
> Right:
> LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code}
> As they are processed - they are registered as leaves (added to the Array).
>  
> When Join node is being created it knows what the `condition expressions` is:
> {code:java}
> =(TO_HEX($0), $3)
> {code}
> Since TO_HEX is not computed anywhere - it modifies the left input to be as 
> follows (via RelOptUtil#pushDownJoinConditions) because 
> RelBuilder#shouldMergeProject always return true. 
> {code:java}
> LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
> VARCHAR $f3)
> {code}
> where `VARCHAR $f3` is a result of TO_HEX. Note that the list of leaves is 
> not updated.
> [https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L2571]
>  
> Finally, when identifier "query.fA1_2" is being converted (via 
> SqlToRelConverter#convertIdentifier) for the top-most node
> {code:java}
> top-most node:
> LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
> VARCHAR id0, VARCHAR fB1)
>   LogicalJoin with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
> VARCHAR $f3, VARCHAR id0, VARCHAR fB1)
>     LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
> VARCHAR $f3)
>       LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1)
>     LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code}
> Blackboard perform a lookup (via SqlToRelConverter#lookupExp), in process of 
> which LookupContext is created.
> In a constructor, LookupContext performs flatten, which recursively traverses 
>  tree of nodes (from above codeblock) and checks the leaves to see if they 
> contain such expression. When it does get to the modified left input of a 
> join it does not get a match on it and continues further down to a TableScan.
> When it finally flattens the result, TableScan's RecordType knows nothing 
> about a duplicated field `fA1_2`, causing an error above.
>  
> I think a viable solution would be to modify Join creation to register a 
> resulting join inputs as leaves (when they are modified). Alternative 
> approach would be to not merge Projects when join needs to modify an input.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to