[
https://issues.apache.org/jira/browse/DRILL-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Khurram Faraaz updated DRILL-3680:
----------------------------------
Assignee: Deneche A. Hakim (was: Chris Westin)
> window function query returns Incorrect results
> ------------------------------------------------
>
> Key: DRILL-3680
> URL: https://issues.apache.org/jira/browse/DRILL-3680
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Flow
> Affects Versions: 1.2.0
> Environment: private-branch
> https://github.com/adeneche/incubator-drill/tree/new-window-funcs
> Reporter: Khurram Faraaz
> Assignee: Deneche A. Hakim
> Priority: Critical
> Labels: window_function
>
> Query plan from Drill for the query that returns wrong results
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select c1 , c2 , lead(c2) OVER
> ( PARTITION BY c2 ORDER BY c1) lead_c2 FROM (SELECT c1 , c2, ntile(3)
> over(PARTITION BY c2 ORDER BY c1) FROM `tblWnulls.parquet`);
> +------+------+
> | text | json |
> +------+------+
> | 00-00 Screen
> 00-01 Project(c1=[$0], c2=[$1], lead_c2=[$2])
> 00-02 Project(c1=[$0], c2=[$1], lead_c2=[$2])
> 00-03 Project(c1=[$0], c2=[$1], $2=[$3])
> 00-04 Window(window#0=[window(partition {1} order by [0] range
> between UNBOUNDED PRECEDING and CURRENT ROW aggs [LEAD($1)])])
> 00-05 Window(window#0=[window(partition {1} order by [0] range
> between UNBOUNDED PRECEDING and CURRENT ROW aggs [NTILE($2)])])
> 00-06 SelectionVectorRemover
> 00-07 Sort(sort0=[$1], sort1=[$0], dir0=[ASC], dir1=[ASC])
> 00-08 Project(c1=[$1], c2=[$0])
> 00-09 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=maprfs:///tmp/tblWnulls.parquet]],
> selectionRoot=maprfs:/tmp/tblWnulls.parquet, numFiles=1, columns=[`c1`,
> `c2`]]])
> {code}
> Results returned by Drill.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select c1 , c2 , lead(c2) OVER ( PARTITION BY
> c2 ORDER BY c1) lead_c2 FROM (SELECT c1 , c2, ntile(3) over(PARTITION BY c2
> ORDER BY c1) FROM `tblWnulls.parquet`);
> +-------------+-------+----------+
> | c1 | c2 | lead_c2 |
> +-------------+-------+----------+
> | 0 | a | null |
> | 1 | a | null |
> | 5 | a | null |
> | 10 | a | null |
> | 11 | a | null |
> | 14 | a | null |
> | 11111 | a | null |
> | 2 | b | null |
> | 9 | b | null |
> | 13 | b | null |
> | 17 | b | null |
> | 4 | c | null |
> | 6 | c | null |
> | 8 | c | null |
> | 12 | c | null |
> | 13 | c | null |
> | 13 | c | null |
> | null | c | null |
> | 10 | d | null |
> | 11 | d | null |
> | 2147483647 | d | null |
> | 2147483647 | d | null |
> | null | d | null |
> | null | d | null |
> | -1 | e | null |
> | 15 | e | null |
> | 19 | null | null |
> | 65536 | null | null |
> | 1000000 | null | null |
> | null | null | null |
> +-------------+-------+----------+
> 30 rows selected (0.339 seconds)
> {code}
> Results returned by Postgres
> {code}
> postgres=# select c1 , c2 , lead(c2) OVER ( PARTITION BY c2 ORDER BY c1)
> lead_c2 FROM (SELECT c1 , c2, ntile(3) over(PARTITION BY c2 ORDER BY c1) FROM
> t222) sub_query;
> c1 | c2 | lead_c2
> ------------+----+---------
> 0 | a | a
> 1 | a | a
> 5 | a | a
> 10 | a | a
> 11 | a | a
> 14 | a | a
> 11111 | a |
> 2 | b | b
> 9 | b | b
> 13 | b | b
> 17 | b |
> 4 | c | c
> 6 | c | c
> 8 | c | c
> 12 | c | c
> 13 | c | c
> 13 | c | c
> | c |
> 10 | d | d
> 11 | d | d
> 2147483647 | d | d
> 2147483647 | d | d
> | d | d
> | d |
> -1 | e | e
> 15 | e |
> 19 | |
> 65536 | |
> 1000000 | |
> | |
> (30 rows)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)