[ 
https://issues.apache.org/jira/browse/DRILL-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720811#comment-14720811
 ] 

Aman Sinha commented on DRILL-3668:
-----------------------------------

I have added review comments to the PR and there's some ongoing discussion on 
the dev mailing list related to this issue. 

> Incorrect results FIRST_VALUE function
> --------------------------------------
>
>                 Key: DRILL-3668
>                 URL: https://issues.apache.org/jira/browse/DRILL-3668
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.2.0
>         Environment: private-branch 
> https://github.com/adeneche/incubator-drill/tree/new-window-funcs
>            Reporter: Khurram Faraaz
>            Assignee: Deneche A. Hakim
>            Priority: Critical
>              Labels: window_function
>             Fix For: 1.2.0
>
>
> Query returns incorrect results for first_value column on developers private 
> branch.
> Query results from Drill (6 rows)
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( 
> PARTITION BY c2 ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY 
> c1) lag_c2, ntile(3) OVER ( PARTITION BY c2 ORDER BY c1) tile, 
> first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2, 
> last_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM 
> `tblWnulls.parquet`) sub_query where firstVal_c2 = 'e' ORDER BY tile, c1;
> +----------+-------+----------+---------+-------+--------------+-------------+
> |    c1    |  c2   | lead_c2  | lag_c2  | tile  | firstVal_c2  | lastVal_c2  |
> +----------+-------+----------+---------+-------+--------------+-------------+
> | -1       | e     | e        | null    | 1     | e            | e           |
> | 19       | null  | null     | null    | 1     | e            | null        |
> | 65536    | null  | null     | null    | 1     | e            | null        |
> | 15       | e     | null     | e       | 2     | e            | e           |
> | 1000000  | null  | null     | null    | 2     | e            | null        |
> | null     | null  | null     | null    | 3     | e            | null        |
> +----------+-------+----------+---------+-------+--------------+-------------+
> 6 rows selected (0.269 seconds)
> {code}
> Query results from Postgres for same input data (2 rows returned)
> {code}
> postgres=# SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION BY c2 
> ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, 
> ntile(3) OVER ( PARTITION BY c2 ORDER BY c1) tile, first_value(c2) OVER ( 
> PARTITION BY c2 ORDER BY c1) firstVal_c2, last_value(c2) OVER ( PARTITION BY 
> c2 ORDER BY c1) lastVal_c2 FROM t222) sub_query where firstVal_c2 = 'e' ORDER 
> BY tile, c1;
>  c1 | c2 | lead_c2 | lag_c2 | tile | firstval_c2 | lastval_c2 
> ----+----+---------+--------+------+-------------+------------
>  -1 | e  | e       |        |    1 | e           | e
>  15 | e  |         | e      |    2 | e           | e
> (2 rows)
> {code}
> Another query that returns different results
> Results returned by Drill - zero rows returned
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( 
> PARTITION BY c2 ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY 
> c1) lag_c2, ntile(3) OVER ( PARTITION BY c2 ORDER BY c1) tile, 
> first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2, 
> last_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM 
> `tblWnulls.parquet`) sub_query where firstVal_c2 is null ORDER BY tile, c1;
> +-----+-----+----------+---------+-------+--------------+-------------+
> | c1  | c2  | lead_c2  | lag_c2  | tile  | firstVal_c2  | lastVal_c2  |
> +-----+-----+----------+---------+-------+--------------+-------------+
> +-----+-----+----------+---------+-------+--------------+-------------+
> No rows selected (0.279 seconds)
> {code}
> Results returned by Postgres for same input data, 4 rows returned
> {code}
> postgres=# SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION BY c2 
> ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, 
> ntile(3) OVER ( PARTITION BY c2 ORDER BY c1) tile, first_value(c2) OVER ( 
> PARTITION BY c2 ORDER BY c1) firstVal_c2, last_value(c2) OVER ( PARTITION BY 
> c2 ORDER BY c1) lastVal_c2 FROM t222) sub_query where firstVal_c2 is null 
> ORDER BY tile, c1;
>    c1    | c2 | lead_c2 | lag_c2 | tile | firstval_c2 | lastval_c2 
> ---------+----+---------+--------+------+-------------+------------
>       19 |    |         |        |    1 |             | 
>    65536 |    |         |        |    1 |             | 
>  1000000 |    |         |        |    2 |             | 
>          |    |         |        |    3 |             | 
> (4 rows)
> {code}
> Table definition on Postgres
> {code}
> postgres=# \d t222
>         Table "public.t222"
>  Column |     Type     | Modifiers 
> --------+--------------+-----------
>  c1     | integer      | 
>  c2     | character(1) | 
> {code}
> Data from the parquet file used in above queries on Drill
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select * from `tblWnulls.parquet`;
> +-------------+-------+
> |     c1      |  c2   |
> +-------------+-------+
> | 1           | a     |
> | 2           | b     |
> | 13          | c     |
> | 4           | c     |
> | 5           | a     |
> | 6           | c     |
> | null        | d     |
> | 17          | b     |
> | 8           | c     |
> | 9           | b     |
> | 10          | d     |
> | 2147483647  | d     |
> | 10          | a     |
> | 11          | a     |
> | null        | c     |
> | 11          | d     |
> | 12          | c     |
> | 19          | null  |
> | 13          | b     |
> | 14          | a     |
> | 13          | c     |
> | 15          | e     |
> | -1          | e     |
> | 0           | a     |
> | 2147483647  | d     |
> | null        | d     |
> | 65536       | null  |
> | 1000000     | null  |
> | null        | null  |
> | 11111       | a     |
> +-------------+-------+
> 30 rows selected (0.145 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to