[
https://issues.apache.org/jira/browse/DRILL-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zelaine Fong updated DRILL-2543:
--------------------------------
Fix Version/s: (was: Future)
1.3.0
> Correlated subquery where outer table contains NULL values returns seemingly
> wrong result
> ------------------------------------------------------------------------------------------
>
> Key: DRILL-2543
> URL: https://issues.apache.org/jira/browse/DRILL-2543
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 0.8.0
> Reporter: Victoria Markman
> Assignee: Jinfeng Ni
> Priority: Critical
> Fix For: 1.3.0
>
>
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1;
> +------------+------------+------------+
> | a1 | b1 | c1 |
> +------------+------------+------------+
> | 1 | 2015-03-01 | aaaaa |
> | 2 | 2015-03-02 | bbbbb |
> | null | null | null |
> +------------+------------+------------+
> 3 rows selected (0.064 seconds)
> 0: jdbc:drill:schema=dfs> select * from t2;
> +------------+------------+------------+
> | a2 | b2 | c2 |
> +------------+------------+------------+
> | 5 | 2017-03-01 | a |
> +------------+------------+------------+
> 1 row selected (0.07 seconds)
> 0: jdbc:drill:schema=dfs> select t1.c1, count(*) from t1 where t1.b1 not in
> (select b2 from t2 where t1.a1 = t2.a2) group by t1.c1 order by t1.c1;
> +------------+------------+
> | c1 | EXPR$1 |
> +------------+------------+
> | aaaaa | 1 |
> | bbbbb | 1 |
> +------------+------------+
> 2 rows selected (0.32 seconds)
> {code}
> Postgres returns row from the outer table where a1 is null.
> This is part that I don't understand, because join condition in the subquery
> should have eliminated row where a1 IS NULL. To me Drill result looks
> correct. Unless there is something different in correlated comparison
> semantics that I'm not aware of.
> {code}
> postgres=# select * from t1;
> a1 | b1 | c1
> ----+------------+-------
> 1 | 2015-03-01 | aaaaa
> 2 | 2015-03-02 | bbbbb
> | |
> (3 rows)
> {code}
> Explain plan for the query:
> {code}
> 00-01 Project(c1=[$0], EXPR$1=[$1])
> 00-02 StreamAgg(group=[{0}], EXPR$1=[COUNT()])
> 00-03 Sort(sort0=[$0], dir0=[ASC])
> 00-04 Project(c1=[$0])
> 00-05 SelectionVectorRemover
> 00-06 Filter(condition=[NOT(IS TRUE($3))])
> 00-07 HashJoin(condition=[=($1, $2)], joinType=[left])
> 00-09 Project($f1=[$0], $f3=[$2])
> 00-11 SelectionVectorRemover
> 00-13 Filter(condition=[IS NOT NULL($1)])
> 00-15 Project(c1=[$1], b1=[$0], a1=[$2])
> 00-17 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=maprfs:/test/t1]], selectionRoot=/test/t1,
> numFiles=1, columns=[`c1`, `b1`, `a1`]]])
> 00-08 Project($f02=[$1], $f2=[$2])
> 00-10 StreamAgg(group=[{0, 1}], agg#0=[MIN($2)])
> 00-12 Sort(sort0=[$0], sort1=[$1], dir0=[ASC],
> dir1=[ASC])
> 00-14 Project($f0=[$1], $f02=[$2], $f1=[true])
> 00-16 HashJoin(condition=[=($2, $0)],
> joinType=[inner])
> 00-18 StreamAgg(group=[{0}])
> 00-20 Sort(sort0=[$0], dir0=[ASC])
> 00-22 Project($f0=[$1])
> 00-23 SelectionVectorRemover
> 00-24 Filter(condition=[IS NOT NULL($0)])
> 00-25 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=maprfs:/test/t1]], selectionRoot=/test/t1,
> numFiles=1, columns=[`b1`, `a1`]]])
> 00-19 Project(a2=[$1], b2=[$0])
> 00-21 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=maprfs:/test/t2]], selectionRoot=/test/t2,
> numFiles=1, columns=[`a2`, `b2`]]])
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)