[ 
https://issues.apache.org/jira/browse/DRILL-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183749#comment-15183749
 ] 

Victoria Markman edited comment on DRILL-4477 at 3/7/16 10:55 PM:
------------------------------------------------------------------

Same in 1.4.0:

{code}
0: jdbc:drill:drillbit=localhost> select * from sys.version;
+----------+-------------------------------------------+-------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
| version  |                 commit_id                 |                        
     commit_message                              |        commit_time         | 
build_email  |         build_time         |
+----------+-------------------------------------------+-------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
| 1.4.0    | 556d9c5dbf4b0f776b57f0d39d07111ccc0d2e7b  | MD-649: do the cost 
adjustment for $SUM0 only when LIMIT 0 is present.  | 07.01.2016 @ 18:19:31 UTC 
 | Unknown      | 07.01.2016 @ 19:52:20 UTC  |
+----------+-------------------------------------------+-------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
1 row selected (1.082 seconds)

0: jdbc:drill:drillbit=localhost> select * from (select t1.a, t1.b, t1.c, t2.a, 
t2.b, t2.c from `tt1.json` t1, `tt2.json` t2 where t1.c = t2.c);
+----+----+----+-----+-----+-----+
| a  | b  | c  | a0  | b0  | c0  |
+----+----+----+-----+-----+-----+
| a  | b  | 1  | a   | b   | 1   |
+----+----+----+-----+-----+-----+
1 row selected (2.005 seconds)
{code}

Star is not the factor here either. I get wrong result even when I explicitly 
name columns:
{code}
0: jdbc:drill:drillbit=localhost> select a,b,c,d,e,f from (select t1.a, t1.b, 
t1.c, t2.a, t2.b, t2.c from `tt1.json` t1, `tt2.json` t2 where t1.c = t2.c) as 
t(a,b,c,d,e,f);
+----+----+----+----+----+----+
| a  | b  | c  | d  | e  | f  |
+----+----+----+----+----+----+
| a  | b  | 1  | a  | b  | 1  |
+----+----+----+----+----+----+
1 row selected (0.342 seconds)
{code}

I went all the way back to 1.0.0 to see if it reproduces there, because I was 
convinced that I've seen exactly the same problem before. It turns out that it 
reproduces there as well. Perplexing that we did not find it until now ... 


was (Author: vicky):
Same in 1.4.0:

{code}
0: jdbc:drill:drillbit=localhost> select * from sys.version;
+----------+-------------------------------------------+-------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
| version  |                 commit_id                 |                        
     commit_message                              |        commit_time         | 
build_email  |         build_time         |
+----------+-------------------------------------------+-------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
| 1.4.0    | 556d9c5dbf4b0f776b57f0d39d07111ccc0d2e7b  | MD-649: do the cost 
adjustment for $SUM0 only when LIMIT 0 is present.  | 07.01.2016 @ 18:19:31 UTC 
 | Unknown      | 07.01.2016 @ 19:52:20 UTC  |
+----------+-------------------------------------------+-------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
1 row selected (1.082 seconds)

0: jdbc:drill:drillbit=localhost> select * from (select t1.a, t1.b, t1.c, t2.a, 
t2.b, t2.c from `tt1.json` t1, `tt2.json` t2 where t1.c = t2.c);
+----+----+----+-----+-----+-----+
| a  | b  | c  | a0  | b0  | c0  |
+----+----+----+-----+-----+-----+
| a  | b  | 1  | a   | b   | 1   |
+----+----+----+-----+-----+-----+
1 row selected (2.005 seconds)
{code}

> Wrong Plan (potentially wrong result) if wrapping a query with SELECT * FROM
> ----------------------------------------------------------------------------
>
>                 Key: DRILL-4477
>                 URL: https://issues.apache.org/jira/browse/DRILL-4477
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Sean Hsuan-Yi Chu
>            Assignee: Sean Hsuan-Yi Chu
>            Priority: Blocker
>             Fix For: 1.6.0
>
>         Attachments: t1.json, t2.json
>
>
> For example, a query  
> {code}
> select * from (select s.name, v.name, v.registration from 
> cp.`tpch/region.parquet` s left outer join cp.`tpch/nation.parquet` v
> on (s.name = v.name) 
> where s.age < 30) t 
> {code}
> gives a plan as below:
> {code}
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(name=[$0], name0=[$1], registration=[$2])
> 00-02        Project(name=[$0], name0=[$0], registration=[$3])
> 00-03          Project(name=[$2], age=[$3], name0=[$0], registration=[$1])
> 00-04            HashJoin(condition=[=($2, $0)], joinType=[right])
> 00-06              Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], 
> selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, 
> usedMetadataFile=false, columns=[`name`, `registration`]]])
> 00-05              Project(name0=[$0], age=[$1])
> 00-07                SelectionVectorRemover
> 00-08                  Filter(condition=[<($1, 30)])
> 00-09                    Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=classpath:/tpch/region.parquet]], 
> selectionRoot=classpath:/tpch/region.parquet, numFiles=1, 
> usedMetadataFile=false, columns=[`name`, `age`]]])
> {code}
> In the line 00-02, both name and name0 point at the same incoming column 
> (probably due to the JOIN CONDITION). 
> However. the fact that these two are the JOIN condition does not make a case 
> that they must be equal since implicit casting might be invoked to perform 
> the JOIN condition.
> Interestingly, if the SELECT * FROM wrapper is removed, this bug won't be 
> exposed: 
> {code}
> select s.name, v.name, v.registration from cp.`tpch/region.parquet` s left 
> outer join cp.`tpch/nation.parquet` v on (s.name = v.name) 
> where s.age < 30
> {code}
> gives 
> {code}
> 00-00    Screen
> 00-01      Project(name=[$0], name0=[$1], registration=[$2])
> 00-02        Project(name=[$2], name0=[$0], registration=[$1])
> 00-03          HashJoin(condition=[=($2, $0)], joinType=[right])
> 00-05            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=classpath:/tpch/nation.parquet]], 
> selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, 
> usedMetadataFile=false, columns=[`name`, `registration`]]])
> 00-04            Project(name0=[$0])
> 00-06              Project(name=[$0])
> 00-07                SelectionVectorRemover
> 00-08                  Filter(condition=[<($1, 30)])
> 00-09                    Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=classpath:/tpch/region.parquet]], 
> selectionRoot=classpath:/tpch/region.parquet, numFiles=1, 
> usedMetadataFile=false, columns=[`name`, `age`]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to