[ https://issues.apache.org/jira/browse/DRILL-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Victoria Markman updated DRILL-4477: ------------------------------------ Priority: Blocker (was: Major) > Wrong Plan (potentially wrong result) if wrapping a query with SELECT * FROM > ---------------------------------------------------------------------------- > > Key: DRILL-4477 > URL: https://issues.apache.org/jira/browse/DRILL-4477 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization > Reporter: Sean Hsuan-Yi Chu > Priority: Blocker > > For example, a query > {code} > select * from (select s.name, v.name, v.registration from > cp.`tpch/region.parquet` s left outer join cp.`tpch/nation.parquet` v > on (s.name = v.name) > where s.age < 30) t > {code} > gives a plan as below: > {code} > +------+------+ > | text | json | > +------+------+ > | 00-00 Screen > 00-01 Project(name=[$0], name0=[$1], registration=[$2]) > 00-02 Project(name=[$0], name0=[$0], registration=[$3]) > 00-03 Project(name=[$2], age=[$3], name0=[$0], registration=[$1]) > 00-04 HashJoin(condition=[=($2, $0)], joinType=[right]) > 00-06 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], > selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, > usedMetadataFile=false, columns=[`name`, `registration`]]]) > 00-05 Project(name0=[$0], age=[$1]) > 00-07 SelectionVectorRemover > 00-08 Filter(condition=[<($1, 30)]) > 00-09 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=classpath:/tpch/region.parquet]], > selectionRoot=classpath:/tpch/region.parquet, numFiles=1, > usedMetadataFile=false, columns=[`name`, `age`]]]) > {code} > In the line 00-02, both name and name0 point at the same incoming column > (probably due to the JOIN CONDITION). > However. the fact that these two are the JOIN condition does not make a case > that they must be equal since implicit casting might be invoked to perform > the JOIN condition. > Interestingly, if the SELECT * FROM wrapper is removed, this bug won't be > exposed: > {code} > select s.name, v.name, v.registration from cp.`tpch/region.parquet` s left > outer join cp.`tpch/nation.parquet` v on (s.name = v.name) > where s.age < 30 > {code} > gives > {code} > 00-00 Screen > 00-01 Project(name=[$0], name0=[$1], registration=[$2]) > 00-02 Project(name=[$2], name0=[$0], registration=[$1]) > 00-03 HashJoin(condition=[=($2, $0)], joinType=[right]) > 00-05 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=classpath:/tpch/nation.parquet]], > selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, > usedMetadataFile=false, columns=[`name`, `registration`]]]) > 00-04 Project(name0=[$0]) > 00-06 Project(name=[$0]) > 00-07 SelectionVectorRemover > 00-08 Filter(condition=[<($1, 30)]) > 00-09 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=classpath:/tpch/region.parquet]], > selectionRoot=classpath:/tpch/region.parquet, numFiles=1, > usedMetadataFile=false, columns=[`name`, `age`]]]) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)