Khurram Faraaz created DRILL-3680:
-------------------------------------
Summary: window function query returns Incorrect results
Key: DRILL-3680
URL: https://issues.apache.org/jira/browse/DRILL-3680
Project: Apache Drill
Issue Type: Bug
Components: Execution - Flow
Affects Versions: 1.2.0
Environment: private-branch
https://github.com/adeneche/incubator-drill/tree/new-window-funcs
Reporter: Khurram Faraaz
Assignee: Chris Westin
Priority: Critical
Query plan from Drill for the query that returns wrong results
{code}
0: jdbc:drill:schema=dfs.tmp> explain plan for select c1 , c2 , lead(c2) OVER (
PARTITION BY c2 ORDER BY c1) lead_c2 FROM (SELECT c1 , c2, ntile(3)
over(PARTITION BY c2 ORDER BY c1) FROM `tblWnulls.parquet`);
+------+------+
| text | json |
+------+------+
| 00-00 Screen
00-01 Project(c1=[$0], c2=[$1], lead_c2=[$2])
00-02 Project(c1=[$0], c2=[$1], lead_c2=[$2])
00-03 Project(c1=[$0], c2=[$1], $2=[$3])
00-04 Window(window#0=[window(partition {1} order by [0] range
between UNBOUNDED PRECEDING and CURRENT ROW aggs [LEAD($1)])])
00-05 Window(window#0=[window(partition {1} order by [0] range
between UNBOUNDED PRECEDING and CURRENT ROW aggs [NTILE($2)])])
00-06 SelectionVectorRemover
00-07 Sort(sort0=[$1], sort1=[$0], dir0=[ASC], dir1=[ASC])
00-08 Project(c1=[$1], c2=[$0])
00-09 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=maprfs:///tmp/tblWnulls.parquet]],
selectionRoot=maprfs:/tmp/tblWnulls.parquet, numFiles=1, columns=[`c1`, `c2`]]])
{code}
Results returned by Drill.
{code}
0: jdbc:drill:schema=dfs.tmp> select c1 , c2 , lead(c2) OVER ( PARTITION BY c2
ORDER BY c1) lead_c2 FROM (SELECT c1 , c2, ntile(3) over(PARTITION BY c2 ORDER
BY c1) FROM `tblWnulls.parquet`);
+-------------+-------+----------+
| c1 | c2 | lead_c2 |
+-------------+-------+----------+
| 0 | a | null |
| 1 | a | null |
| 5 | a | null |
| 10 | a | null |
| 11 | a | null |
| 14 | a | null |
| 11111 | a | null |
| 2 | b | null |
| 9 | b | null |
| 13 | b | null |
| 17 | b | null |
| 4 | c | null |
| 6 | c | null |
| 8 | c | null |
| 12 | c | null |
| 13 | c | null |
| 13 | c | null |
| null | c | null |
| 10 | d | null |
| 11 | d | null |
| 2147483647 | d | null |
| 2147483647 | d | null |
| null | d | null |
| null | d | null |
| -1 | e | null |
| 15 | e | null |
| 19 | null | null |
| 65536 | null | null |
| 1000000 | null | null |
| null | null | null |
+-------------+-------+----------+
30 rows selected (0.339 seconds)
{code}
Results returned by Postgres
{code}
postgres=# select c1 , c2 , lead(c2) OVER ( PARTITION BY c2 ORDER BY c1)
lead_c2 FROM (SELECT c1 , c2, ntile(3) over(PARTITION BY c2 ORDER BY c1) FROM
t222) sub_query;
c1 | c2 | lead_c2
------------+----+---------
0 | a | a
1 | a | a
5 | a | a
10 | a | a
11 | a | a
14 | a | a
11111 | a |
2 | b | b
9 | b | b
13 | b | b
17 | b |
4 | c | c
6 | c | c
8 | c | c
12 | c | c
13 | c | c
13 | c | c
| c |
10 | d | d
11 | d | d
2147483647 | d | d
2147483647 | d | d
| d | d
| d |
-1 | e | e
15 | e |
19 | |
65536 | |
1000000 | |
| |
(30 rows)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)