[ https://issues.apache.org/jira/browse/DRILL-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720811#comment-14720811 ]
Aman Sinha commented on DRILL-3668: ----------------------------------- I have added review comments to the PR and there's some ongoing discussion on the dev mailing list related to this issue. > Incorrect results FIRST_VALUE function > -------------------------------------- > > Key: DRILL-3668 > URL: https://issues.apache.org/jira/browse/DRILL-3668 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow > Affects Versions: 1.2.0 > Environment: private-branch > https://github.com/adeneche/incubator-drill/tree/new-window-funcs > Reporter: Khurram Faraaz > Assignee: Deneche A. Hakim > Priority: Critical > Labels: window_function > Fix For: 1.2.0 > > > Query returns incorrect results for first_value column on developers private > branch. > Query results from Drill (6 rows) > {code} > 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( > PARTITION BY c2 ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY > c1) lag_c2, ntile(3) OVER ( PARTITION BY c2 ORDER BY c1) tile, > first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2, > last_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM > `tblWnulls.parquet`) sub_query where firstVal_c2 = 'e' ORDER BY tile, c1; > +----------+-------+----------+---------+-------+--------------+-------------+ > | c1 | c2 | lead_c2 | lag_c2 | tile | firstVal_c2 | lastVal_c2 | > +----------+-------+----------+---------+-------+--------------+-------------+ > | -1 | e | e | null | 1 | e | e | > | 19 | null | null | null | 1 | e | null | > | 65536 | null | null | null | 1 | e | null | > | 15 | e | null | e | 2 | e | e | > | 1000000 | null | null | null | 2 | e | null | > | null | null | null | null | 3 | e | null | > +----------+-------+----------+---------+-------+--------------+-------------+ > 6 rows selected (0.269 seconds) > {code} > Query results from Postgres for same input data (2 rows returned) > {code} > postgres=# SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION BY c2 > ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, > ntile(3) OVER ( PARTITION BY c2 ORDER BY c1) tile, first_value(c2) OVER ( > PARTITION BY c2 ORDER BY c1) firstVal_c2, last_value(c2) OVER ( PARTITION BY > c2 ORDER BY c1) lastVal_c2 FROM t222) sub_query where firstVal_c2 = 'e' ORDER > BY tile, c1; > c1 | c2 | lead_c2 | lag_c2 | tile | firstval_c2 | lastval_c2 > ----+----+---------+--------+------+-------------+------------ > -1 | e | e | | 1 | e | e > 15 | e | | e | 2 | e | e > (2 rows) > {code} > Another query that returns different results > Results returned by Drill - zero rows returned > {code} > 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( > PARTITION BY c2 ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY > c1) lag_c2, ntile(3) OVER ( PARTITION BY c2 ORDER BY c1) tile, > first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2, > last_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM > `tblWnulls.parquet`) sub_query where firstVal_c2 is null ORDER BY tile, c1; > +-----+-----+----------+---------+-------+--------------+-------------+ > | c1 | c2 | lead_c2 | lag_c2 | tile | firstVal_c2 | lastVal_c2 | > +-----+-----+----------+---------+-------+--------------+-------------+ > +-----+-----+----------+---------+-------+--------------+-------------+ > No rows selected (0.279 seconds) > {code} > Results returned by Postgres for same input data, 4 rows returned > {code} > postgres=# SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION BY c2 > ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, > ntile(3) OVER ( PARTITION BY c2 ORDER BY c1) tile, first_value(c2) OVER ( > PARTITION BY c2 ORDER BY c1) firstVal_c2, last_value(c2) OVER ( PARTITION BY > c2 ORDER BY c1) lastVal_c2 FROM t222) sub_query where firstVal_c2 is null > ORDER BY tile, c1; > c1 | c2 | lead_c2 | lag_c2 | tile | firstval_c2 | lastval_c2 > ---------+----+---------+--------+------+-------------+------------ > 19 | | | | 1 | | > 65536 | | | | 1 | | > 1000000 | | | | 2 | | > | | | | 3 | | > (4 rows) > {code} > Table definition on Postgres > {code} > postgres=# \d t222 > Table "public.t222" > Column | Type | Modifiers > --------+--------------+----------- > c1 | integer | > c2 | character(1) | > {code} > Data from the parquet file used in above queries on Drill > {code} > 0: jdbc:drill:schema=dfs.tmp> select * from `tblWnulls.parquet`; > +-------------+-------+ > | c1 | c2 | > +-------------+-------+ > | 1 | a | > | 2 | b | > | 13 | c | > | 4 | c | > | 5 | a | > | 6 | c | > | null | d | > | 17 | b | > | 8 | c | > | 9 | b | > | 10 | d | > | 2147483647 | d | > | 10 | a | > | 11 | a | > | null | c | > | 11 | d | > | 12 | c | > | 19 | null | > | 13 | b | > | 14 | a | > | 13 | c | > | 15 | e | > | -1 | e | > | 0 | a | > | 2147483647 | d | > | null | d | > | 65536 | null | > | 1000000 | null | > | null | null | > | 11111 | a | > +-------------+-------+ > 30 rows selected (0.145 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)