[
https://issues.apache.org/jira/browse/DRILL-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698885#comment-14698885
]
Sean Hsuan-Yi Chu commented on DRILL-3657:
------------------------------------------
The window function in the top window should not have pointed at $2, which
represents a the output of the window function below.
> Wrong result with SUM(1) window function when multiple partitions are present
> -----------------------------------------------------------------------------
>
> Key: DRILL-3657
> URL: https://issues.apache.org/jira/browse/DRILL-3657
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 1.2.0
> Reporter: Victoria Markman
> Assignee: Sean Hsuan-Yi Chu
> Priority: Critical
> Labels: window_function
> Fix For: 1.2.0
>
>
> The common use case for this would be: SUM( CASE WHEN x=y THEN 1 ELSE 0 END)
> Wrong result:
> {code}
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . > b2,
> . . . . . . . . . . . . > c2,
> . . . . . . . . . . . . > sum(1) over(partition by b2 order by c2),
> . . . . . . . . . . . . > sum(1) over(partition by c2)
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . > t2
> . . . . . . . . . . . . > order by
> . . . . . . . . . . . . > 1,2;
> +--------+-------------+---------+---------+
> | b2 | c2 | EXPR$2 | EXPR$3 |
> +--------+-------------+---------+---------+
> | aaaaa | 2015-01-01 | 1 | 1 |
> | bbbbb | 2015-01-02 | 3 | 9 |
> | bbbbb | 2015-01-02 | 3 | 9 |
> | bbbbb | 2015-01-02 | 3 | 9 |
> | ccccc | 2015-01-03 | 1 | 1 |
> | ddddd | 2015-01-04 | 1 | 1 |
> | eeeee | 2015-01-05 | 1 | 1 |
> | fffff | 2015-01-06 | 1 | 1 |
> | ggggg | 2015-01-07 | 2 | 4 |
> | ggggg | 2015-01-07 | 2 | 4 |
> | hhhhh | 2015-01-08 | 1 | 1 |
> | iiiii | 2015-01-09 | 1 | 1 |
> | zzz | 2014-12-31 | 1 | 1 |
> +--------+-------------+---------+---------+
> 13 rows selected (0.204 seconds)
> {code}
> Explain plan:
> {code}
> | 00-00 Screen
> 00-01 Project(b2=[$0], c2=[$1], EXPR$2=[$2], EXPR$3=[$3])
> 00-02 SelectionVectorRemover
> 00-03 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
> 00-04 Window(window#0=[window(partition {1} order by [] range
> between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($2)])])
> 00-05 SelectionVectorRemover
> 00-06 Sort(sort0=[$1], dir0=[ASC])
> 00-07 Window(window#0=[window(partition {0} order by [1]
> range between UNBOUNDED PRECEDING and CURRENT ROW aggs [SUM($2)])])
> 00-08 SelectionVectorRemover
> 00-09 Sort(sort0=[$0], sort1=[$1], dir0=[ASC],
> dir1=[ASC])
> 00-10 Project(b2=[$1], c2=[$0])
> 00-11 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/t2]],
> selectionRoot=maprfs:/drill/testdata/subqueries/t2, numFiles=1,
> columns=[`b2`, `c2`]]])
> {code}
> If you have a query with only one of these partitions, result is correct :
> {code}
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . > b2,
> . . . . . . . . . . . . > c2,
> . . . . . . . . . . . . > sum(1) over(partition by b2 order by c2)
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . > t2
> . . . . . . . . . . . . > order by
> . . . . . . . . . . . . > 1,2;
> +--------+-------------+---------+
> | b2 | c2 | EXPR$2 |
> +--------+-------------+---------+
> | aaaaa | 2015-01-01 | 1 |
> | bbbbb | 2015-01-02 | 3 |
> | bbbbb | 2015-01-02 | 3 |
> | bbbbb | 2015-01-02 | 3 |
> | ccccc | 2015-01-03 | 1 |
> | ddddd | 2015-01-04 | 1 |
> | eeeee | 2015-01-05 | 1 |
> | fffff | 2015-01-06 | 1 |
> | ggggg | 2015-01-07 | 2 |
> | ggggg | 2015-01-07 | 2 |
> | hhhhh | 2015-01-08 | 1 |
> | iiiii | 2015-01-09 | 1 |
> | zzz | 2014-12-31 | 1 |
> +--------+-------------+---------+
> 13 rows selected (0.196 seconds)
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . > b2,
> . . . . . . . . . . . . > c2,
> . . . . . . . . . . . . > sum(1) over(partition by c2)
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . > t2
> . . . . . . . . . . . . > order by
> . . . . . . . . . . . . > 1,2;
> +--------+-------------+---------+
> | b2 | c2 | EXPR$2 |
> +--------+-------------+---------+
> | aaaaa | 2015-01-01 | 1 |
> | bbbbb | 2015-01-02 | 3 |
> | bbbbb | 2015-01-02 | 3 |
> | bbbbb | 2015-01-02 | 3 |
> | ccccc | 2015-01-03 | 1 |
> | ddddd | 2015-01-04 | 1 |
> | eeeee | 2015-01-05 | 1 |
> | fffff | 2015-01-06 | 1 |
> | ggggg | 2015-01-07 | 2 |
> | ggggg | 2015-01-07 | 2 |
> | hhhhh | 2015-01-08 | 1 |
> | iiiii | 2015-01-09 | 1 |
> | zzz | 2014-12-31 | 1 |
> +--------+-------------+---------+
> 13 rows selected (0.179 seconds)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)