[jira] [Commented] (DRILL-2988) Correlated exists subquery returns wrong result if join columns in subquery are not fully qualified

Jinfeng Ni (JIRA) Fri, 22 May 2015 10:52:35 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556504#comment-14556504
 ]


Jinfeng Ni commented on DRILL-2988:
-----------------------------------

I think the naming resolutions are different  in case 1 and case 2, in terms of 
naming scope. 

case 1:
{code}
select * from t1 where exists (select * from t3 where a1 = a3);
{code}

case 2:

{code}
select * from t1, t2 where a1 = a2;
{code}

In case 1, because of parenthesis, the scope to resolve "a1" and "a2" is 
subquery itself.  As such, Drill will resolve "a1" and "a2" to be associated 
with t2, since t2 is schema-less. 

In case 2, the scope to resolve "a1" and "a2" is the query itself. In that 
scope, there are two tables, and Drill could not determine which table (t1 or 
t2) has column "a1" and "a2". That's why we will see error of "Column 'a1' is 
ambiguous".

We may argue why not raise same error for case 1? First of all, as explained, 
the parenthesis changes the naming scope. That's the quite common behavior in 
any programming language.  Secondly, if we raise "ambiguous" column error for 
case 1, that means user has to explicitly add the table prefix "t3", even for 
column coming from "t3". People might argue such qualification for columns 
within the naming scope is not necessary. 

{code}
select * from t1 where exists (select * from t3 where t3.a3 = 1 and t3.b3 > 20 
and t3.c3 < 300);

in stead of

select * from t1 where exists (select * from t3 where a3 = 1 and b3 > 20 and c3 
< 300);

{code}

For the two queries involving scalar-subquery, please either re-open 
DRILL-1937, or DRILL-2949 is relevant to the issues.




> Correlated exists subquery returns wrong result if join columns in subquery 
> are not fully qualified
> ---------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-2988
>                 URL: https://issues.apache.org/jira/browse/DRILL-2988
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.0.0
>            Reporter: Victoria Markman
>            Assignee: Jinfeng Ni
>              Labels: document_if_not_fixed
>
> Two parquet tables:
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1;
> +------------+------------+------------+
> |     a1     |     b1     |     c1     |
> +------------+------------+------------+
> | 1          | aaaaa      | 2015-01-01 |
> | 2          | bbbbb      | 2015-01-02 |
> | 3          | ccccc      | 2015-01-03 |
> | 4          | null       | 2015-01-04 |
> | 5          | eeeee      | 2015-01-05 |
> | 6          | fffff      | 2015-01-06 |
> | 7          | ggggg      | 2015-01-07 |
> | null       | hhhhh      | 2015-01-08 |
> | 9          | iiiii      | null       |
> | 10         | jjjjj      | 2015-01-10 |
> +------------+------------+------------+
> 10 rows selected (0.109 seconds)
> 0: jdbc:drill:schema=dfs> select * from t3;
> +------------+------------+------------+
> |     a3     |     b3     |     c3     |
> +------------+------------+------------+
> | 1          | aaaaa      | 2015-01-01 |
> | 2          | bbbbb      | 2015-01-02 |
> | 3          | ccccc      | 2015-01-03 |
> +------------+------------+------------+
> 3 rows selected (0.106 seconds)
> {code}
> When column name is not qualified with the table name in the join condition, 
> we get wrong result:
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1 where exists (select * from t3 
> where a1 = a3);
> +------------+------------+------------+
> |     a1     |     b1     |     c1     |
> +------------+------------+------------+
> | 1          | aaaaa      | 2015-01-01 |
> | 2          | bbbbb      | 2015-01-02 |
> | 3          | ccccc      | 2015-01-03 |
> | 4          | null       | 2015-01-04 |
> | 5          | eeeee      | 2015-01-05 |
> | 6          | fffff      | 2015-01-06 |
> | 7          | ggggg      | 2015-01-07 |
> | null       | hhhhh      | 2015-01-08 |
> | 9          | iiiii      | null       |
> | 10         | jjjjj      | 2015-01-10 |
> +------------+------------+------------+
> 10 rows selected (0.177 seconds)
> {code}
> Works correctly when column names are fully qualified ...
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1 where exists (select * from t3 
> where t1.a1 = t3.a3);
> +------------+------------+------------+
> |     a1     |     b1     |     c1     |
> +------------+------------+------------+
> | 1          | aaaaa      | 2015-01-01 |
> | 2          | bbbbb      | 2015-01-02 |
> | 3          | ccccc      | 2015-01-03 |
> +------------+------------+------------+
> 3 rows selected (0.353 seconds)
> {code}
> Plan for the query with the wrong result:
> {code}
> 0: jdbc:drill:schema=dfs> explain plan for select * from t1 where exists 
> (select * from t3 where a1 = a3);
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      Project(*=[$0])
> 00-02        Project(T124¦¦*=[$0])
> 00-03          NestedLoopJoin(condition=[true], joinType=[inner])
> 00-05            Project(T124¦¦*=[$0])
> 00-07              Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
> selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`*`]]])
> 00-04            SelectionVectorRemover
> 00-06              Filter(condition=[IS NOT NULL($0)])
> 00-08                StreamAgg(group=[{}], agg#0=[MIN($0)])
> 00-09                  Project($f0=[true])
> 00-10                    SelectionVectorRemover
> 00-11                      Filter(condition=[=($0, $1)])
> 00-12                        Project(a1=[$1], a3=[$0])
> 00-13                          Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t3]], 
> selectionRoot=/drill/testdata/aggregation/t3, numFiles=1, columns=[`a1`, 
> `a3`]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2988) Correlated exists subquery returns wrong result if join columns in subquery are not fully qualified

Reply via email to