[ 
https://issues.apache.org/jira/browse/DRILL-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14606444#comment-14606444
 ] 

Jinfeng Ni commented on DRILL-3414:
-----------------------------------

Good to know this is not a bug in terms of query result correctness ! :-) 

On the other hand, the query plan seems to be not optimal; there are two sort 
operaors, and one of them seems to be redundant.

{code}

0: jdbc:drill:zk=local> explain plan for
. . . . . . . . . . . > select
. . . . . . . . . . . >         j4.c_boolean,
. . . . . . . . . . . >         j4.c_date,
. . . . . . . . . . . >         j4.c_integer,
. . . . . . . . . . . >         sum(j4.c_integer) over (partition by 
j4.c_boolean order by j4.c_date, j4.c_integer)
. . . . . . . . . . . > from    j1
. . . . . . . . . . . >            left outer join
. . . . . . . . . . . >            j4 on j1.c_integer = j4.c_integer
. . . . . . . . . . . > order by 1,2,3;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(c_boolean=[$0], c_date=[$1], c_integer=[$2], EXPR$3=[$3])
00-02        SelectionVectorRemover
00-03          Sort(sort0=[$0], sort1=[$1], sort2=[$2], dir0=[ASC], dir1=[ASC], 
dir2=[ASC])
00-04            Project(c_boolean=[$1], c_date=[$2], c_integer=[$0], 
EXPR$3=[CASE(>($3, 0), $4, null)])
00-05              Window(window#0=[window(partition {1} order by [2, 0] range 
between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])])
00-06                SelectionVectorRemover
00-07                  Sort(sort0=[$1], sort1=[$2], sort2=[$0], dir0=[ASC], 
dir1=[ASC], dir2=[ASC])
00-08                    Project(c_integer0=[$1], c_boolean=[$2], c_date=[$3])
00-09                      HashJoin(condition=[=($0, $1)], joinType=[left])
00-11                        Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=file:/tmp/j1]], selectionRoot=/tmp/j1, 
numFiles=1, columns=[`c_integer`]]])
00-10                        Project(c_integer0=[$0], c_boolean=[$1], 
c_date=[$2])
00-12                          Project(c_integer=[$1], c_boolean=[$0], 
c_date=[$2])
00-13                            Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=file:/tmp/j4]], selectionRoot=/tmp/j4, 
numFiles=1, columns=[`c_integer`, `c_boolean`, `c_date`]]])
{code}


Operator 00-03 is not necessary, since operator 00-07 has sorted the input by 
c_boolean, c_date, c_integer. Ideally, the planer should detect the redundant 
sort operators, and remove the second one. 

Please open another JIRA for this query optimization issue, and target for 
1.2.0. 



> Window function on a null producing column of an outer join results in the 
> wrong result
> ---------------------------------------------------------------------------------------
>
>                 Key: DRILL-3414
>                 URL: https://issues.apache.org/jira/browse/DRILL-3414
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.0.0
>            Reporter: Victoria Markman
>            Assignee: Jinfeng Ni
>            Priority: Critical
>              Labels: window_funcion
>         Attachments: j1.tar, j4.parquet, query.tar
>
>
> {code:sql}
> select
>         j4.c_boolean,
>         j4.c_date,
>         j4.c_integer,
>         sum(j4.c_integer) over (partition by j4.c_boolean order by j4.c_date, 
> j4.c_integer)
> from    j1
>            left outer join
>            j4 on j1.c_integer = j4.c_integer
> order by 1,2,3;
> {code}
> If window function is on left side, query returns correct result.
> This works:
> {code:sql}
> select
>         j1.c_boolean,
>         j1.c_date,
>         sum(j1.c_integer) over (partition by j1.c_boolean order by j1.c_date)
> from
>         j1
>                 left outer join
>         j4 on j1.c_integer = j4.c_integer
> order by
>         1, 2;
> {code}
> Attached:
> 1. query.tar (q2.sql , q2.res (postgres output), q2.out (drill output) )
> 2. tables : j1.tar, j4.parquet



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to