[ 
https://issues.apache.org/jira/browse/DRILL-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649543#comment-14649543
 ] 

Aman Sinha commented on DRILL-3580:
-----------------------------------

Interestingly, swapping the 2 window functions changes the plan and it looks 
correct.  
{code}
0: jdbc:drill:zk=local> explain plan for select position_id, salary, 
sum(position_id + salary) over (partition by position_id), sum(salary) over 
(partition by position_id) from cp.`employee.json` limit 20;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(position_id=[$0], salary=[$1], EXPR$2=[$2], EXPR$3=[$3])
00-02        SelectionVectorRemover
00-03          Limit(fetch=[20])
00-04            Project(position_id=[$0], salary=[$1], $2=[$3], $3=[$4])
00-05              Window(window#0=[window(partition {0} order by [] range 
between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($2), SUM($1)])])
00-06                SelectionVectorRemover
00-07                  Sort(sort0=[$0], dir0=[ASC])
00-08                    Project(position_id=[$0], salary=[$1], $2=[+($0, $1)])
00-09                      Scan(groupscan=[EasyGroupScan 
[selectionRoot=classpath:/employee.json, numFiles=1, columns=[`position_id`, 
`salary`], files=[classpath:/employee.json]]])
{code}
I believe the root cause of the problem is projection pushdown.   In the 
original query, the star column is being projected and another Project is 
needed to produce the salary + position_id expression.  It seems it might 
prevent the two Window nodes from being merged.  The second query has only 1 
Project between the Scan and Window nodes.  It is quite likely the issue is 
related to DRILL-3412. 

> wrong plan for window function queries containing function(col1 + colb)
> -----------------------------------------------------------------------
>
>                 Key: DRILL-3580
>                 URL: https://issues.apache.org/jira/browse/DRILL-3580
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.1.0
>            Reporter: Deneche A. Hakim
>            Assignee: Jinfeng Ni
>            Priority: Critical
>              Labels: window_function
>             Fix For: 1.2.0
>
>
> The following query has a wrong plan:
> {noformat}
> explain plan for select position_id, salary, sum(salary) over (partition by 
> position_id), sum(position_id + salary) over (partition by position_id) from 
> cp.`employee.json` limit 20;
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      ProjectAllowDup(position_id=[$0], salary=[$1], EXPR$2=[$2], 
> EXPR$3=[$3])
> 00-02        SelectionVectorRemover
> 00-03          Limit(fetch=[20])
> 00-04            Project(position_id=[$0], salary=[$1], w0$o0=[$2], 
> w0$o00=[$4])
> 00-05              Window(window#0=[window(partition {0} order by [] range 
> between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($3)])])
> 00-06                Project(position_id=[$1], salary=[$2], w0$o0=[$3], 
> $3=[+($1, $2)])
> 00-07                  Window(window#0=[window(partition {1} order by [] 
> range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($2)])])
> 00-08                    SelectionVectorRemover
> 00-09                      Sort(sort0=[$1], dir0=[ASC])
> 00-10                        Project(T13¦¦*=[$0], position_id=[$1], 
> salary=[$2])
> 00-11                          Scan(groupscan=[EasyGroupScan 
> [selectionRoot=classpath:/employee.json, numFiles=1, columns=[`*`], 
> files=[classpath:/employee.json]]])
> {noformat}
> The plan contains 2 window operators which shouldn't be possible according to 
> DRILL-3196. 
> The results are also incorrect.
> Depending on which aggregation or window function used we get wrong results 
> or an IndexOutOfBounds exception



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to