[
https://issues.apache.org/jira/browse/DRILL-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14606444#comment-14606444
]
Jinfeng Ni commented on DRILL-3414:
-----------------------------------
Good to know this is not a bug in terms of query result correctness ! :-)
On the other hand, the query plan seems to be not optimal; there are two sort
operaors, and one of them seems to be redundant.
{code}
0: jdbc:drill:zk=local> explain plan for
. . . . . . . . . . . > select
. . . . . . . . . . . > j4.c_boolean,
. . . . . . . . . . . > j4.c_date,
. . . . . . . . . . . > j4.c_integer,
. . . . . . . . . . . > sum(j4.c_integer) over (partition by
j4.c_boolean order by j4.c_date, j4.c_integer)
. . . . . . . . . . . > from j1
. . . . . . . . . . . > left outer join
. . . . . . . . . . . > j4 on j1.c_integer = j4.c_integer
. . . . . . . . . . . > order by 1,2,3;
+------+------+
| text | json |
+------+------+
| 00-00 Screen
00-01 Project(c_boolean=[$0], c_date=[$1], c_integer=[$2], EXPR$3=[$3])
00-02 SelectionVectorRemover
00-03 Sort(sort0=[$0], sort1=[$1], sort2=[$2], dir0=[ASC], dir1=[ASC],
dir2=[ASC])
00-04 Project(c_boolean=[$1], c_date=[$2], c_integer=[$0],
EXPR$3=[CASE(>($3, 0), $4, null)])
00-05 Window(window#0=[window(partition {1} order by [2, 0] range
between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])])
00-06 SelectionVectorRemover
00-07 Sort(sort0=[$1], sort1=[$2], sort2=[$0], dir0=[ASC],
dir1=[ASC], dir2=[ASC])
00-08 Project(c_integer0=[$1], c_boolean=[$2], c_date=[$3])
00-09 HashJoin(condition=[=($0, $1)], joinType=[left])
00-11 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=file:/tmp/j1]], selectionRoot=/tmp/j1,
numFiles=1, columns=[`c_integer`]]])
00-10 Project(c_integer0=[$0], c_boolean=[$1],
c_date=[$2])
00-12 Project(c_integer=[$1], c_boolean=[$0],
c_date=[$2])
00-13 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=file:/tmp/j4]], selectionRoot=/tmp/j4,
numFiles=1, columns=[`c_integer`, `c_boolean`, `c_date`]]])
{code}
Operator 00-03 is not necessary, since operator 00-07 has sorted the input by
c_boolean, c_date, c_integer. Ideally, the planer should detect the redundant
sort operators, and remove the second one.
Please open another JIRA for this query optimization issue, and target for
1.2.0.
> Window function on a null producing column of an outer join results in the
> wrong result
> ---------------------------------------------------------------------------------------
>
> Key: DRILL-3414
> URL: https://issues.apache.org/jira/browse/DRILL-3414
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 1.0.0
> Reporter: Victoria Markman
> Assignee: Jinfeng Ni
> Priority: Critical
> Labels: window_funcion
> Attachments: j1.tar, j4.parquet, query.tar
>
>
> {code:sql}
> select
> j4.c_boolean,
> j4.c_date,
> j4.c_integer,
> sum(j4.c_integer) over (partition by j4.c_boolean order by j4.c_date,
> j4.c_integer)
> from j1
> left outer join
> j4 on j1.c_integer = j4.c_integer
> order by 1,2,3;
> {code}
> If window function is on left side, query returns correct result.
> This works:
> {code:sql}
> select
> j1.c_boolean,
> j1.c_date,
> sum(j1.c_integer) over (partition by j1.c_boolean order by j1.c_date)
> from
> j1
> left outer join
> j4 on j1.c_integer = j4.c_integer
> order by
> 1, 2;
> {code}
> Attached:
> 1. query.tar (q2.sql , q2.res (postgres output), q2.out (drill output) )
> 2. tables : j1.tar, j4.parquet
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)