[ 
https://issues.apache.org/jira/browse/DRILL-3567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646873#comment-14646873
 ] 

Sean Hsuan-Yi Chu commented on DRILL-3567:
------------------------------------------

The cause resides in Calcite (will file an issue shortly). 

The reason is that, in [~vicky]'s original query, the second column has its own 
window while the other two share the same one. Since that window is shared, 
Calcite's LogicalWindow puts these two window functions together, which 
unfortunately changes the order.

The solution is to let the LogicalProject above be aware of that. 

Let me try to work on it.

> Wrong result in a query with multiple window functions and different over 
> clauses
> ---------------------------------------------------------------------------------
>
>                 Key: DRILL-3567
>                 URL: https://issues.apache.org/jira/browse/DRILL-3567
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.1.0
>         Environment: private-branch-with-multiple-partitions-enabled
>            Reporter: Victoria Markman
>            Assignee: Sean Hsuan-Yi Chu
>            Priority: Critical
>              Labels: window_function
>         Attachments: t1_parquet
>
>
> {code}
> 0: jdbc:drill:drillbit=localhost> select * from t1;
> +-------+--------+-------------+
> |  a1   |   b1   |     c1      |
> +-------+--------+-------------+
> | 1     | aaaaa  | 2015-01-01  |
> | 2     | bbbbb  | 2015-01-02  |
> | 3     | ccccc  | 2015-01-03  |
> | 4     | null   | 2015-01-04  |
> | 5     | eeeee  | 2015-01-05  |
> | 6     | fffff  | 2015-01-06  |
> | 7     | ggggg  | 2015-01-07  |
> | null  | hhhhh  | 2015-01-08  |
> | 9     | iiiii  | null        |
> | 10    | jjjjj  | 2015-01-10  |
> +-------+--------+-------------+
> 10 rows selected (0.078 seconds)
> {code}
> Wrong result, columns are projected in the wrong order:
> {code}
> 0: jdbc:drill:drillbit=localhost> select
> . . . . . . . . . . . . . . . . >         count(*) over(partition by b1 order 
> by c1) as count1,
> . . . . . . . . . . . . . . . . >         count(*) over(partition by a1 order 
> by c1) as count2,
> . . . . . . . . . . . . . . . . >         sum(a1)  over(partition by b1 order 
> by c1) as sum1
> . . . . . . . . . . . . . . . . > from 
> . . . . . . . . . . . . . . . . >         t1;
> +---------+---------+-------+
> | count1  | count2  | sum1  |
> +---------+---------+-------+
> | 1       | 1       | 1     |
> | 1       | 2       | 1     |
> | 1       | 3       | 1     |
> | 1       | 4       | 1     |
> | 1       | 5       | 1     |
> | 1       | 6       | 1     |
> | 1       | 7       | 1     |
> | 1       | 9       | 1     |
> | 1       | 10      | 1     |
> | 1       | null    | 1     |
> +---------+---------+-------+
> 10 rows selected (0.113 seconds)
> {code}
> Explain plan:
> {code}
> 0: jdbc:drill:drillbit=localhost> explain plan for select
> . . . . . . . . . . . . . . . . >         count(*) over(partition by b1 order 
> by c1) as count1,
> . . . . . . . . . . . . . . . . >         count(*) over(partition by a1 order 
> by c1) as count2,
> . . . . . . . . . . . . . . . . >         sum(a1)  over(partition by b1 order 
> by c1) as sum1
> . . . . . . . . . . . . . . . . > from 
> . . . . . . . . . . . . . . . . >         t1;
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      ProjectAllowDup(count1=[$0], count2=[$1], sum1=[$2])
> 00-02        Project(w0$o0=[$4], w0$o1=[$5], w1$o0=[$6])
> 00-03          Window(window#0=[window(partition {3} order by [2] range 
> between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT()])])
> 00-04            SelectionVectorRemover
> 00-05              Sort(sort0=[$3], sort1=[$2], dir0=[ASC], dir1=[ASC])
> 00-06                Window(window#0=[window(partition {1} order by [2] range 
> between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT(), SUM($3)])])
> 00-07                  SelectionVectorRemover
> 00-08                    Sort(sort0=[$1], sort1=[$2], dir0=[ASC], dir1=[ASC])
> 00-09                      Project(T61¦¦*=[$0], b1=[$1], c1=[$2], a1=[$3])
> 00-10                        Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=file:/Users/vmarkman/drill/testdata/subqueries/t1]], 
> selectionRoot=file:/Users/vmarkman/drill/testdata/subqueries/t1, numFiles=1, 
> columns=[`*`]]])
> {code}
> If you remove frame that is not the same as other two, query works correctly:
> {code}
> 0: jdbc:drill:drillbit=localhost> select
> . . . . . . . . . . . . . . . . >         count(*) over(partition by b1 order 
> by c1) as count1,
> . . . . . . . . . . . . . . . . >         sum(a1)  over(partition by b1 order 
> by c1) as sum1
> . . . . . . . . . . . . . . . . > from
> . . . . . . . . . . . . . . . . >         t1;
> +---------+-------+
> | count1  | sum1  |
> +---------+-------+
> | 1       | 1     |
> | 1       | 2     |
> | 1       | 3     |
> | 1       | 5     |
> | 1       | 6     |
> | 1       | 7     |
> | 1       | null  |
> | 1       | 9     |
> | 1       | 10    |
> | 1       | 4     |
> +---------+-------+
> 10 rows selected (0.099 seconds)
> {code}
> and in the different order (just for fun) :
> {code}
> 0: jdbc:drill:drillbit=localhost> select
> . . . . . . . . . . . . . . . . >         sum(a1)  over(partition by b1 order 
> by c1) as sum1,
> . . . . . . . . . . . . . . . . >         count(*) over(partition by b1 order 
> by c1) as count1
> . . . . . . . . . . . . . . . . > from 
> . . . . . . . . . . . . . . . . >         t1;
> +-------+---------+
> | sum1  | count1  |
> +-------+---------+
> | 1     | 1       |
> | 2     | 1       |
> | 3     | 1       |
> | 5     | 1       |
> | 6     | 1       |
> | 7     | 1       |
> | null  | 1       |
> | 9     | 1       |
> | 10    | 1       |
> | 4     | 1       |
> +-------+---------+
> 10 rows selected (0.096 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to