[
https://issues.apache.org/jira/browse/CALCITE-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16991417#comment-16991417
]
Wang Yanlin commented on CALCITE-3571:
--------------------------------------
Same with https://issues.apache.org/jira/browse/CALCITE-3575 ?
> RelBuilder#shouldMergeProject throws an exception for JOIN with complex
> conditions
> ----------------------------------------------------------------------------------
>
> Key: CALCITE-3571
> URL: https://issues.apache.org/jira/browse/CALCITE-3571
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.21.0
> Reporter: Kirill Kozlov
> Priority: Major
> Labels: pull-request-available
> Time Spent: 50m
> Remaining Estimate: 0h
>
> Table `tblA` schema:
> ||field_name||field_type||
> |id|VARBINARY|
> |fA1|VARCHAR|
> Table `tblB` schema:
> ||field_name||field_type||
> |id|VARCHAR|
> |fB1|VARCHAR|
> For the following query:
> {code:java}
> WITH query as (select SELECT id, fA1, fA1 AS fA1_2 FROM tblA)
> SELECT fA1, fB1, fA1_2
> FROM query
> JOIN tblB ON (TO_HEX(query.id) = tblB.id){code}
> TO_HEX is used as an example, but it can be any other expression; for
> instance `cast(query.id as Integer)`.
> Out of bound exception is thrown in SqlToRelConverter.
>
> Before joins are created left and right paths are parsed first. For the 1st
> query above they are as follows:
> {code:java}
> Left:
> LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2)
> LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1)
> Right:
> LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code}
> As they are processed - they are registered as leaves (added to the Array).
>
> When Join node is being created it knows what the `condition expressions` is:
> {code:java}
> =(TO_HEX($0), $3)
> {code}
> Since TO_HEX is not computed anywhere - it modifies the left input to be as
> follows (via RelOptUtil#pushDownJoinConditions) because
> RelBuilder#shouldMergeProject always return true.
> {code:java}
> LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2,
> VARCHAR $f3)
> {code}
> where `VARCHAR $f3` is a result of TO_HEX. Note that the list of leaves is
> not updated.
> [https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L2571]
>
> Finally, when identifier "query.fA1_2" is being converted (via
> SqlToRelConverter#convertIdentifier) for the top-most node
> {code:java}
> top-most node:
> LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2,
> VARCHAR id0, VARCHAR fB1)
> LogicalJoin with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2,
> VARCHAR $f3, VARCHAR id0, VARCHAR fB1)
> LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2,
> VARCHAR $f3)
> LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1)
> LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code}
> Blackboard perform a lookup (via SqlToRelConverter#lookupExp), in process of
> which LookupContext is created.
> In a constructor, LookupContext performs flatten, which recursively traverses
> tree of nodes (from above codeblock) and checks the leaves to see if they
> contain such expression. When it does get to the modified left input of a
> join it does not get a match on it and continues further down to a TableScan.
> When it finally flattens the result, TableScan's RecordType knows nothing
> about a duplicated field `fA1_2`, causing an error above.
>
> I think a viable solution would be to modify Join creation to register a
> resulting join inputs as leaves (when they are modified). Alternative
> approach would be to not merge Projects when join needs to modify an input.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)