[ 
https://issues.apache.org/jira/browse/DRILL-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183572#comment-15183572
 ] 

Zelaine Fong commented on DRILL-4477:
-------------------------------------

Do wrong results only occur if the types of the two columns in the equality 
join predicate are different?  If so, then the scope of this problem is 
probably more narrow, as I would expect that in most cases, join columns are 
typically of the same type.   

> Wrong Plan (potentially wrong result) if wrapping a query with SELECT * FROM
> ----------------------------------------------------------------------------
>
>                 Key: DRILL-4477
>                 URL: https://issues.apache.org/jira/browse/DRILL-4477
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Sean Hsuan-Yi Chu
>            Priority: Blocker
>             Fix For: 1.6.0
>
>         Attachments: t1.json, t2.json
>
>
> For example, a query  
> {code}
> select * from (select s.name, v.name, v.registration from 
> cp.`tpch/region.parquet` s left outer join cp.`tpch/nation.parquet` v
> on (s.name = v.name) 
> where s.age < 30) t 
> {code}
> gives a plan as below:
> {code}
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(name=[$0], name0=[$1], registration=[$2])
> 00-02        Project(name=[$0], name0=[$0], registration=[$3])
> 00-03          Project(name=[$2], age=[$3], name0=[$0], registration=[$1])
> 00-04            HashJoin(condition=[=($2, $0)], joinType=[right])
> 00-06              Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], 
> selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, 
> usedMetadataFile=false, columns=[`name`, `registration`]]])
> 00-05              Project(name0=[$0], age=[$1])
> 00-07                SelectionVectorRemover
> 00-08                  Filter(condition=[<($1, 30)])
> 00-09                    Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=classpath:/tpch/region.parquet]], 
> selectionRoot=classpath:/tpch/region.parquet, numFiles=1, 
> usedMetadataFile=false, columns=[`name`, `age`]]])
> {code}
> In the line 00-02, both name and name0 point at the same incoming column 
> (probably due to the JOIN CONDITION). 
> However. the fact that these two are the JOIN condition does not make a case 
> that they must be equal since implicit casting might be invoked to perform 
> the JOIN condition.
> Interestingly, if the SELECT * FROM wrapper is removed, this bug won't be 
> exposed: 
> {code}
> select s.name, v.name, v.registration from cp.`tpch/region.parquet` s left 
> outer join cp.`tpch/nation.parquet` v on (s.name = v.name) 
> where s.age < 30
> {code}
> gives 
> {code}
> 00-00    Screen
> 00-01      Project(name=[$0], name0=[$1], registration=[$2])
> 00-02        Project(name=[$2], name0=[$0], registration=[$1])
> 00-03          HashJoin(condition=[=($2, $0)], joinType=[right])
> 00-05            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=classpath:/tpch/nation.parquet]], 
> selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, 
> usedMetadataFile=false, columns=[`name`, `registration`]]])
> 00-04            Project(name0=[$0])
> 00-06              Project(name=[$0])
> 00-07                SelectionVectorRemover
> 00-08                  Filter(condition=[<($1, 30)])
> 00-09                    Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=classpath:/tpch/region.parquet]], 
> selectionRoot=classpath:/tpch/region.parquet, numFiles=1, 
> usedMetadataFile=false, columns=[`name`, `age`]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to