[
https://issues.apache.org/jira/browse/DRILL-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649543#comment-14649543
]
Aman Sinha commented on DRILL-3580:
-----------------------------------
Interestingly, swapping the 2 window functions changes the plan and it looks
correct.
{code}
0: jdbc:drill:zk=local> explain plan for select position_id, salary,
sum(position_id + salary) over (partition by position_id), sum(salary) over
(partition by position_id) from cp.`employee.json` limit 20;
+------+------+
| text | json |
+------+------+
| 00-00 Screen
00-01 Project(position_id=[$0], salary=[$1], EXPR$2=[$2], EXPR$3=[$3])
00-02 SelectionVectorRemover
00-03 Limit(fetch=[20])
00-04 Project(position_id=[$0], salary=[$1], $2=[$3], $3=[$4])
00-05 Window(window#0=[window(partition {0} order by [] range
between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($2), SUM($1)])])
00-06 SelectionVectorRemover
00-07 Sort(sort0=[$0], dir0=[ASC])
00-08 Project(position_id=[$0], salary=[$1], $2=[+($0, $1)])
00-09 Scan(groupscan=[EasyGroupScan
[selectionRoot=classpath:/employee.json, numFiles=1, columns=[`position_id`,
`salary`], files=[classpath:/employee.json]]])
{code}
I believe the root cause of the problem is projection pushdown. In the
original query, the star column is being projected and another Project is
needed to produce the salary + position_id expression. It seems it might
prevent the two Window nodes from being merged. The second query has only 1
Project between the Scan and Window nodes. It is quite likely the issue is
related to DRILL-3412.
> wrong plan for window function queries containing function(col1 + colb)
> -----------------------------------------------------------------------
>
> Key: DRILL-3580
> URL: https://issues.apache.org/jira/browse/DRILL-3580
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 1.1.0
> Reporter: Deneche A. Hakim
> Assignee: Jinfeng Ni
> Priority: Critical
> Labels: window_function
> Fix For: 1.2.0
>
>
> The following query has a wrong plan:
> {noformat}
> explain plan for select position_id, salary, sum(salary) over (partition by
> position_id), sum(position_id + salary) over (partition by position_id) from
> cp.`employee.json` limit 20;
> +------+------+
> | text | json |
> +------+------+
> | 00-00 Screen
> 00-01 ProjectAllowDup(position_id=[$0], salary=[$1], EXPR$2=[$2],
> EXPR$3=[$3])
> 00-02 SelectionVectorRemover
> 00-03 Limit(fetch=[20])
> 00-04 Project(position_id=[$0], salary=[$1], w0$o0=[$2],
> w0$o00=[$4])
> 00-05 Window(window#0=[window(partition {0} order by [] range
> between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($3)])])
> 00-06 Project(position_id=[$1], salary=[$2], w0$o0=[$3],
> $3=[+($1, $2)])
> 00-07 Window(window#0=[window(partition {1} order by []
> range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($2)])])
> 00-08 SelectionVectorRemover
> 00-09 Sort(sort0=[$1], dir0=[ASC])
> 00-10 Project(T13¦¦*=[$0], position_id=[$1],
> salary=[$2])
> 00-11 Scan(groupscan=[EasyGroupScan
> [selectionRoot=classpath:/employee.json, numFiles=1, columns=[`*`],
> files=[classpath:/employee.json]]])
> {noformat}
> The plan contains 2 window operators which shouldn't be possible according to
> DRILL-3196.
> The results are also incorrect.
> Depending on which aggregation or window function used we get wrong results
> or an IndexOutOfBounds exception
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)