[
https://issues.apache.org/jira/browse/DRILL-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745981#comment-14745981
]
Khurram Faraaz commented on DRILL-3783:
---------------------------------------
Query plan for the query
{code}
0: jdbc:drill:schema=dfs.tmp> explain plan for select count(c1) from (select
cast(columns[0] as int) c1 from `testWindow.csv`) union all (select
cast(columns[0] as int) c2 from `testWindow.csv`);
+------+------+
| text | json |
+------+------+
| 00-00 Screen
00-01 Project(EXPR$0=[$0])
00-02 UnionAll(all=[true])
00-04 StreamAgg(group=[{}], EXPR$0=[COUNT($0)])
00-06 Project(c1=[CAST(ITEM($0, 0)):INTEGER])
00-07 Scan(groupscan=[EasyGroupScan
[selectionRoot=maprfs:/tmp/testWindow.csv, numFiles=1, columns=[`columns`[0]],
files=[maprfs:///tmp/testWindow.csv]]])
00-03 Project(c2=[CAST(ITEM($0, 0)):INTEGER])
00-05 Scan(groupscan=[EasyGroupScan
[selectionRoot=maprfs:/tmp/testWindow.csv, numFiles=1, columns=[`columns`[0]],
files=[maprfs:///tmp/testWindow.csv]]])
{code}
> Incorrect results : COUNT(<column-name>) over results returned by UNION ALL
> ----------------------------------------------------------------------------
>
> Key: DRILL-3783
> URL: https://issues.apache.org/jira/browse/DRILL-3783
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Relational Operators
> Affects Versions: 1.2.0
> Environment: 4 node cluster on CentOS
> Reporter: Khurram Faraaz
> Assignee: Sean Hsuan-Yi Chu
> Priority: Critical
> Fix For: 1.2.0
>
>
> Count over results returned union all query, returns incorrect results. The
> below query returned an Exception (please se DRILL-2637) that JIRA was marked
> as fixed, however the query returns incorrect results.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select count(c1) from (select cast(columns[0]
> as int) c1 from `testWindow.csv`) union all (select cast(columns[0] as int)
> c2 from `testWindow.csv`);
> +---------+
> | EXPR$0 |
> +---------+
> | 11 |
> | 100 |
> | 10 |
> | 2 |
> | 50 |
> | 55 |
> | 67 |
> | 113 |
> | 119 |
> | 89 |
> | 57 |
> | 61 |
> +---------+
> 12 rows selected (0.753 seconds)
> {code}
> Results returned by the query on LHS and RHS of Union all operator are
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from
> `testWindow.csv`;
> +------+
> | c1 |
> +------+
> | 100 |
> | 10 |
> | 2 |
> | 50 |
> | 55 |
> | 67 |
> | 113 |
> | 119 |
> | 89 |
> | 57 |
> | 61 |
> +------+
> 11 rows selected (0.197 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c2 from
> `testWindow.csv`;
> +------+
> | c2 |
> +------+
> | 100 |
> | 10 |
> | 2 |
> | 50 |
> | 55 |
> | 67 |
> | 113 |
> | 119 |
> | 89 |
> | 57 |
> | 61 |
> +------+
> 11 rows selected (0.173 seconds)
> {code}
> Note that enclosing the queries within correct parentheses returns correct
> results. We do not want to return incorrect results to user when the
> parentheses are missing.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select count(c1) from ((select cast(columns[0]
> as int) c1 from `testWindow.csv`) union all (select cast(columns[0] as int)
> c2 from `testWindow.csv`));
> +---------+
> | EXPR$0 |
> +---------+
> | 22 |
> +---------+
> 1 row selected (0.234 seconds)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)