Victoria Markman created DRILL-2988: ---------------------------------------
Summary: Correlated exists subquery returns wrong results if join columns in subquery are not fully qualified Key: DRILL-2988 URL: https://issues.apache.org/jira/browse/DRILL-2988 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Affects Versions: 1.0.0 Reporter: Victoria Markman Assignee: Jinfeng Ni Two parquet tables: {code} 0: jdbc:drill:schema=dfs> select * from t1; +------------+------------+------------+ | a1 | b1 | c1 | +------------+------------+------------+ | 1 | aaaaa | 2015-01-01 | | 2 | bbbbb | 2015-01-02 | | 3 | ccccc | 2015-01-03 | | 4 | null | 2015-01-04 | | 5 | eeeee | 2015-01-05 | | 6 | fffff | 2015-01-06 | | 7 | ggggg | 2015-01-07 | | null | hhhhh | 2015-01-08 | | 9 | iiiii | null | | 10 | jjjjj | 2015-01-10 | +------------+------------+------------+ 10 rows selected (0.109 seconds) 0: jdbc:drill:schema=dfs> select * from t3; +------------+------------+------------+ | a3 | b3 | c3 | +------------+------------+------------+ | 1 | aaaaa | 2015-01-01 | | 2 | bbbbb | 2015-01-02 | | 3 | ccccc | 2015-01-03 | +------------+------------+------------+ 3 rows selected (0.106 seconds) {code} When column name is not qualified with the table name in the join condition, we get wrong result: {code} 0: jdbc:drill:schema=dfs> select * from t1 where exists (select * from t3 where a1 = a3); +------------+------------+------------+ | a1 | b1 | c1 | +------------+------------+------------+ | 1 | aaaaa | 2015-01-01 | | 2 | bbbbb | 2015-01-02 | | 3 | ccccc | 2015-01-03 | | 4 | null | 2015-01-04 | | 5 | eeeee | 2015-01-05 | | 6 | fffff | 2015-01-06 | | 7 | ggggg | 2015-01-07 | | null | hhhhh | 2015-01-08 | | 9 | iiiii | null | | 10 | jjjjj | 2015-01-10 | +------------+------------+------------+ 10 rows selected (0.177 seconds) {code} Works correctly when column names are fully qualified ... {code} 0: jdbc:drill:schema=dfs> select * from t1 where exists (select * from t3 where t1.a1 = t3.a3); +------------+------------+------------+ | a1 | b1 | c1 | +------------+------------+------------+ | 1 | aaaaa | 2015-01-01 | | 2 | bbbbb | 2015-01-02 | | 3 | ccccc | 2015-01-03 | +------------+------------+------------+ 3 rows selected (0.353 seconds) {code} Plan for the query with the wrong result: {code} 0: jdbc:drill:schema=dfs> explain plan for select * from t1 where exists (select * from t3 where a1 = a3); +------------+------------+ | text | json | +------------+------------+ | 00-00 Screen 00-01 Project(*=[$0]) 00-02 Project(T124¦¦*=[$0]) 00-03 NestedLoopJoin(condition=[true], joinType=[inner]) 00-05 Project(T124¦¦*=[$0]) 00-07 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`*`]]]) 00-04 SelectionVectorRemover 00-06 Filter(condition=[IS NOT NULL($0)]) 00-08 StreamAgg(group=[{}], agg#0=[MIN($0)]) 00-09 Project($f0=[true]) 00-10 SelectionVectorRemover 00-11 Filter(condition=[=($0, $1)]) 00-12 Project(a1=[$1], a3=[$0]) 00-13 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t3]], selectionRoot=/drill/testdata/aggregation/t3, numFiles=1, columns=[`a1`, `a3`]]]) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)