Victoria Markman created DRILL-3657:
---------------------------------------

             Summary: Wrong result with SUM(1) window function when multiple 
partitions are present
                 Key: DRILL-3657
                 URL: https://issues.apache.org/jira/browse/DRILL-3657
             Project: Apache Drill
          Issue Type: Bug
          Components: Query Planning & Optimization
    Affects Versions: 1.2.0
            Reporter: Victoria Markman
            Assignee: Jinfeng Ni
            Priority: Critical


The common use case for this would be: SUM( CASE WHEN x=y THEN 1 ELSE 0 END)

Wrong result:
{code}
0: jdbc:drill:schema=dfs> select
. . . . . . . . . . . . >         b2,
. . . . . . . . . . . . >         c2,
. . . . . . . . . . . . >         sum(1)  over(partition by b2 order by c2),
. . . . . . . . . . . . >         sum(1)  over(partition by c2)
. . . . . . . . . . . . > from
. . . . . . . . . . . . >         t2
. . . . . . . . . . . . > order by
. . . . . . . . . . . . >         1,2;
+--------+-------------+---------+---------+
|   b2   |     c2      | EXPR$2  | EXPR$3  |
+--------+-------------+---------+---------+
| aaaaa  | 2015-01-01  | 1       | 1       |
| bbbbb  | 2015-01-02  | 3       | 9       |
| bbbbb  | 2015-01-02  | 3       | 9       |
| bbbbb  | 2015-01-02  | 3       | 9       |
| ccccc  | 2015-01-03  | 1       | 1       |
| ddddd  | 2015-01-04  | 1       | 1       |
| eeeee  | 2015-01-05  | 1       | 1       |
| fffff  | 2015-01-06  | 1       | 1       |
| ggggg  | 2015-01-07  | 2       | 4       |
| ggggg  | 2015-01-07  | 2       | 4       |
| hhhhh  | 2015-01-08  | 1       | 1       |
| iiiii  | 2015-01-09  | 1       | 1       |
| zzz    | 2014-12-31  | 1       | 1       |
+--------+-------------+---------+---------+
13 rows selected (0.204 seconds)
{code}

Explain plan:
{code}
| 00-00    Screen
00-01      Project(b2=[$0], c2=[$1], EXPR$2=[$2], EXPR$3=[$3])
00-02        SelectionVectorRemover
00-03          Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
00-04            Window(window#0=[window(partition {1} order by [] range 
between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($2)])])
00-05              SelectionVectorRemover
00-06                Sort(sort0=[$1], dir0=[ASC])
00-07                  Window(window#0=[window(partition {0} order by [1] range 
between UNBOUNDED PRECEDING and CURRENT ROW aggs [SUM($2)])])
00-08                    SelectionVectorRemover
00-09                      Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
00-10                        Project(b2=[$1], c2=[$0])
00-11                          Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/t2]], 
selectionRoot=maprfs:/drill/testdata/subqueries/t2, numFiles=1, columns=[`b2`, 
`c2`]]])
{code}

If you have a query with only one of these partitions, result is correct :
{code}
0: jdbc:drill:schema=dfs> select
. . . . . . . . . . . . >         b2,
. . . . . . . . . . . . >         c2,
. . . . . . . . . . . . >         sum(1)  over(partition by b2 order by c2)
. . . . . . . . . . . . > from
. . . . . . . . . . . . >         t2
. . . . . . . . . . . . > order by
. . . . . . . . . . . . >         1,2;
+--------+-------------+---------+
|   b2   |     c2      | EXPR$2  |
+--------+-------------+---------+
| aaaaa  | 2015-01-01  | 1       |
| bbbbb  | 2015-01-02  | 3       |
| bbbbb  | 2015-01-02  | 3       |
| bbbbb  | 2015-01-02  | 3       |
| ccccc  | 2015-01-03  | 1       |
| ddddd  | 2015-01-04  | 1       |
| eeeee  | 2015-01-05  | 1       |
| fffff  | 2015-01-06  | 1       |
| ggggg  | 2015-01-07  | 2       |
| ggggg  | 2015-01-07  | 2       |
| hhhhh  | 2015-01-08  | 1       |
| iiiii  | 2015-01-09  | 1       |
| zzz    | 2014-12-31  | 1       |
+--------+-------------+---------+
13 rows selected (0.196 seconds)

0: jdbc:drill:schema=dfs> select
. . . . . . . . . . . . >         b2,
. . . . . . . . . . . . >         c2,
. . . . . . . . . . . . >         sum(1)  over(partition by c2)
. . . . . . . . . . . . > from
. . . . . . . . . . . . >         t2
. . . . . . . . . . . . > order by
. . . . . . . . . . . . >         1,2;
+--------+-------------+---------+
|   b2   |     c2      | EXPR$2  |
+--------+-------------+---------+
| aaaaa  | 2015-01-01  | 1       |
| bbbbb  | 2015-01-02  | 3       |
| bbbbb  | 2015-01-02  | 3       |
| bbbbb  | 2015-01-02  | 3       |
| ccccc  | 2015-01-03  | 1       |
| ddddd  | 2015-01-04  | 1       |
| eeeee  | 2015-01-05  | 1       |
| fffff  | 2015-01-06  | 1       |
| ggggg  | 2015-01-07  | 2       |
| ggggg  | 2015-01-07  | 2       |
| hhhhh  | 2015-01-08  | 1       |
| iiiii  | 2015-01-09  | 1       |
| zzz    | 2014-12-31  | 1       |
+--------+-------------+---------+
13 rows selected (0.179 seconds)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to