[jira] [Commented] (DRILL-2988) Correlated exists subquery returns wrong result if join columns in subquery are not fully qualified

Jinfeng Ni (JIRA) Fri, 22 May 2015 11:30:40 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556566#comment-14556566
 ]


Jinfeng Ni commented on DRILL-2988:
-----------------------------------

We may argue why Postgres or other schema-aware system works for case 1:

{code}
select * from t1 where exists (select * from t3 where a1 = a3);
{code}

In Postgres or other schema-ware system,  it will first try to resolve both 
"a1" and "a3" within the scope of (), which has table "t3" only.  Since schema 
is known, a3 is successfully resolved with t3, while "a1" is not resolved 
within the scope of ().  Then, the system will try to resolve "a1" in the outer 
naming scope, which has table t1 and it will eventually resolve "a1" with "t1".

For schema-less system like Drill,  it will first resolve both "a1" and "a3" 
within the scope of (), which has table "t3" only. Since it does not have the 
schema, and it will by default assume "t3" has those columns. If user want to 
express "a1" is not coming from "t3" but "t1", they have to add qualification 
"t1.a1" in the subquery ().



> Correlated exists subquery returns wrong result if join columns in subquery 
> are not fully qualified
> ---------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-2988
>                 URL: https://issues.apache.org/jira/browse/DRILL-2988
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.0.0
>            Reporter: Victoria Markman
>            Assignee: Jinfeng Ni
>              Labels: document_if_not_fixed
>
> Two parquet tables:
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1;
> +------------+------------+------------+
> |     a1     |     b1     |     c1     |
> +------------+------------+------------+
> | 1          | aaaaa      | 2015-01-01 |
> | 2          | bbbbb      | 2015-01-02 |
> | 3          | ccccc      | 2015-01-03 |
> | 4          | null       | 2015-01-04 |
> | 5          | eeeee      | 2015-01-05 |
> | 6          | fffff      | 2015-01-06 |
> | 7          | ggggg      | 2015-01-07 |
> | null       | hhhhh      | 2015-01-08 |
> | 9          | iiiii      | null       |
> | 10         | jjjjj      | 2015-01-10 |
> +------------+------------+------------+
> 10 rows selected (0.109 seconds)
> 0: jdbc:drill:schema=dfs> select * from t3;
> +------------+------------+------------+
> |     a3     |     b3     |     c3     |
> +------------+------------+------------+
> | 1          | aaaaa      | 2015-01-01 |
> | 2          | bbbbb      | 2015-01-02 |
> | 3          | ccccc      | 2015-01-03 |
> +------------+------------+------------+
> 3 rows selected (0.106 seconds)
> {code}
> When column name is not qualified with the table name in the join condition, 
> we get wrong result:
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1 where exists (select * from t3 
> where a1 = a3);
> +------------+------------+------------+
> |     a1     |     b1     |     c1     |
> +------------+------------+------------+
> | 1          | aaaaa      | 2015-01-01 |
> | 2          | bbbbb      | 2015-01-02 |
> | 3          | ccccc      | 2015-01-03 |
> | 4          | null       | 2015-01-04 |
> | 5          | eeeee      | 2015-01-05 |
> | 6          | fffff      | 2015-01-06 |
> | 7          | ggggg      | 2015-01-07 |
> | null       | hhhhh      | 2015-01-08 |
> | 9          | iiiii      | null       |
> | 10         | jjjjj      | 2015-01-10 |
> +------------+------------+------------+
> 10 rows selected (0.177 seconds)
> {code}
> Works correctly when column names are fully qualified ...
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1 where exists (select * from t3 
> where t1.a1 = t3.a3);
> +------------+------------+------------+
> |     a1     |     b1     |     c1     |
> +------------+------------+------------+
> | 1          | aaaaa      | 2015-01-01 |
> | 2          | bbbbb      | 2015-01-02 |
> | 3          | ccccc      | 2015-01-03 |
> +------------+------------+------------+
> 3 rows selected (0.353 seconds)
> {code}
> Plan for the query with the wrong result:
> {code}
> 0: jdbc:drill:schema=dfs> explain plan for select * from t1 where exists 
> (select * from t3 where a1 = a3);
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      Project(*=[$0])
> 00-02        Project(T124¦¦*=[$0])
> 00-03          NestedLoopJoin(condition=[true], joinType=[inner])
> 00-05            Project(T124¦¦*=[$0])
> 00-07              Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
> selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`*`]]])
> 00-04            SelectionVectorRemover
> 00-06              Filter(condition=[IS NOT NULL($0)])
> 00-08                StreamAgg(group=[{}], agg#0=[MIN($0)])
> 00-09                  Project($f0=[true])
> 00-10                    SelectionVectorRemover
> 00-11                      Filter(condition=[=($0, $1)])
> 00-12                        Project(a1=[$1], a3=[$0])
> 00-13                          Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t3]], 
> selectionRoot=/drill/testdata/aggregation/t3, numFiles=1, columns=[`a1`, 
> `a3`]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2988) Correlated exists subquery returns wrong result if join columns in subquery are not fully qualified

Reply via email to