Victoria Markman created DRILL-3657:
---------------------------------------
Summary: Wrong result with SUM(1) window function when multiple
partitions are present
Key: DRILL-3657
URL: https://issues.apache.org/jira/browse/DRILL-3657
Project: Apache Drill
Issue Type: Bug
Components: Query Planning & Optimization
Affects Versions: 1.2.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni
Priority: Critical
The common use case for this would be: SUM( CASE WHEN x=y THEN 1 ELSE 0 END)
Wrong result:
{code}
0: jdbc:drill:schema=dfs> select
. . . . . . . . . . . . > b2,
. . . . . . . . . . . . > c2,
. . . . . . . . . . . . > sum(1) over(partition by b2 order by c2),
. . . . . . . . . . . . > sum(1) over(partition by c2)
. . . . . . . . . . . . > from
. . . . . . . . . . . . > t2
. . . . . . . . . . . . > order by
. . . . . . . . . . . . > 1,2;
+--------+-------------+---------+---------+
| b2 | c2 | EXPR$2 | EXPR$3 |
+--------+-------------+---------+---------+
| aaaaa | 2015-01-01 | 1 | 1 |
| bbbbb | 2015-01-02 | 3 | 9 |
| bbbbb | 2015-01-02 | 3 | 9 |
| bbbbb | 2015-01-02 | 3 | 9 |
| ccccc | 2015-01-03 | 1 | 1 |
| ddddd | 2015-01-04 | 1 | 1 |
| eeeee | 2015-01-05 | 1 | 1 |
| fffff | 2015-01-06 | 1 | 1 |
| ggggg | 2015-01-07 | 2 | 4 |
| ggggg | 2015-01-07 | 2 | 4 |
| hhhhh | 2015-01-08 | 1 | 1 |
| iiiii | 2015-01-09 | 1 | 1 |
| zzz | 2014-12-31 | 1 | 1 |
+--------+-------------+---------+---------+
13 rows selected (0.204 seconds)
{code}
Explain plan:
{code}
| 00-00 Screen
00-01 Project(b2=[$0], c2=[$1], EXPR$2=[$2], EXPR$3=[$3])
00-02 SelectionVectorRemover
00-03 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
00-04 Window(window#0=[window(partition {1} order by [] range
between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($2)])])
00-05 SelectionVectorRemover
00-06 Sort(sort0=[$1], dir0=[ASC])
00-07 Window(window#0=[window(partition {0} order by [1] range
between UNBOUNDED PRECEDING and CURRENT ROW aggs [SUM($2)])])
00-08 SelectionVectorRemover
00-09 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
00-10 Project(b2=[$1], c2=[$0])
00-11 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/t2]],
selectionRoot=maprfs:/drill/testdata/subqueries/t2, numFiles=1, columns=[`b2`,
`c2`]]])
{code}
If you have a query with only one of these partitions, result is correct :
{code}
0: jdbc:drill:schema=dfs> select
. . . . . . . . . . . . > b2,
. . . . . . . . . . . . > c2,
. . . . . . . . . . . . > sum(1) over(partition by b2 order by c2)
. . . . . . . . . . . . > from
. . . . . . . . . . . . > t2
. . . . . . . . . . . . > order by
. . . . . . . . . . . . > 1,2;
+--------+-------------+---------+
| b2 | c2 | EXPR$2 |
+--------+-------------+---------+
| aaaaa | 2015-01-01 | 1 |
| bbbbb | 2015-01-02 | 3 |
| bbbbb | 2015-01-02 | 3 |
| bbbbb | 2015-01-02 | 3 |
| ccccc | 2015-01-03 | 1 |
| ddddd | 2015-01-04 | 1 |
| eeeee | 2015-01-05 | 1 |
| fffff | 2015-01-06 | 1 |
| ggggg | 2015-01-07 | 2 |
| ggggg | 2015-01-07 | 2 |
| hhhhh | 2015-01-08 | 1 |
| iiiii | 2015-01-09 | 1 |
| zzz | 2014-12-31 | 1 |
+--------+-------------+---------+
13 rows selected (0.196 seconds)
0: jdbc:drill:schema=dfs> select
. . . . . . . . . . . . > b2,
. . . . . . . . . . . . > c2,
. . . . . . . . . . . . > sum(1) over(partition by c2)
. . . . . . . . . . . . > from
. . . . . . . . . . . . > t2
. . . . . . . . . . . . > order by
. . . . . . . . . . . . > 1,2;
+--------+-------------+---------+
| b2 | c2 | EXPR$2 |
+--------+-------------+---------+
| aaaaa | 2015-01-01 | 1 |
| bbbbb | 2015-01-02 | 3 |
| bbbbb | 2015-01-02 | 3 |
| bbbbb | 2015-01-02 | 3 |
| ccccc | 2015-01-03 | 1 |
| ddddd | 2015-01-04 | 1 |
| eeeee | 2015-01-05 | 1 |
| fffff | 2015-01-06 | 1 |
| ggggg | 2015-01-07 | 2 |
| ggggg | 2015-01-07 | 2 |
| hhhhh | 2015-01-08 | 1 |
| iiiii | 2015-01-09 | 1 |
| zzz | 2014-12-31 | 1 |
+--------+-------------+---------+
13 rows selected (0.179 seconds)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)