[ 
https://issues.apache.org/jira/browse/DRILL-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732463#comment-14732463
 ] 

Aman Sinha commented on DRILL-3668:
-----------------------------------

Updated patch looks ok to me since the new patch is resetting the internal 
batch vector locally as part of Window function operator.  For future 
reference, could you add the link to the discussion we had on the dev mailing 
list regarding this issue ?
+1.  



> Incorrect results FIRST_VALUE function
> --------------------------------------
>
>                 Key: DRILL-3668
>                 URL: https://issues.apache.org/jira/browse/DRILL-3668
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.2.0
>         Environment: private-branch 
> https://github.com/adeneche/incubator-drill/tree/new-window-funcs
>            Reporter: Khurram Faraaz
>            Assignee: Aman Sinha
>            Priority: Critical
>              Labels: window_function
>             Fix For: 1.2.0
>
>
> Query returns incorrect results for first_value column on developers private 
> branch.
> Query results from Drill (6 rows)
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( 
> PARTITION BY c2 ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY 
> c1) lag_c2, ntile(3) OVER ( PARTITION BY c2 ORDER BY c1) tile, 
> first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2, 
> last_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM 
> `tblWnulls.parquet`) sub_query where firstVal_c2 = 'e' ORDER BY tile, c1;
> +----------+-------+----------+---------+-------+--------------+-------------+
> |    c1    |  c2   | lead_c2  | lag_c2  | tile  | firstVal_c2  | lastVal_c2  |
> +----------+-------+----------+---------+-------+--------------+-------------+
> | -1       | e     | e        | null    | 1     | e            | e           |
> | 19       | null  | null     | null    | 1     | e            | null        |
> | 65536    | null  | null     | null    | 1     | e            | null        |
> | 15       | e     | null     | e       | 2     | e            | e           |
> | 1000000  | null  | null     | null    | 2     | e            | null        |
> | null     | null  | null     | null    | 3     | e            | null        |
> +----------+-------+----------+---------+-------+--------------+-------------+
> 6 rows selected (0.269 seconds)
> {code}
> Query results from Postgres for same input data (2 rows returned)
> {code}
> postgres=# SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION BY c2 
> ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, 
> ntile(3) OVER ( PARTITION BY c2 ORDER BY c1) tile, first_value(c2) OVER ( 
> PARTITION BY c2 ORDER BY c1) firstVal_c2, last_value(c2) OVER ( PARTITION BY 
> c2 ORDER BY c1) lastVal_c2 FROM t222) sub_query where firstVal_c2 = 'e' ORDER 
> BY tile, c1;
>  c1 | c2 | lead_c2 | lag_c2 | tile | firstval_c2 | lastval_c2 
> ----+----+---------+--------+------+-------------+------------
>  -1 | e  | e       |        |    1 | e           | e
>  15 | e  |         | e      |    2 | e           | e
> (2 rows)
> {code}
> Another query that returns different results
> Results returned by Drill - zero rows returned
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( 
> PARTITION BY c2 ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY 
> c1) lag_c2, ntile(3) OVER ( PARTITION BY c2 ORDER BY c1) tile, 
> first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2, 
> last_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM 
> `tblWnulls.parquet`) sub_query where firstVal_c2 is null ORDER BY tile, c1;
> +-----+-----+----------+---------+-------+--------------+-------------+
> | c1  | c2  | lead_c2  | lag_c2  | tile  | firstVal_c2  | lastVal_c2  |
> +-----+-----+----------+---------+-------+--------------+-------------+
> +-----+-----+----------+---------+-------+--------------+-------------+
> No rows selected (0.279 seconds)
> {code}
> Results returned by Postgres for same input data, 4 rows returned
> {code}
> postgres=# SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION BY c2 
> ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, 
> ntile(3) OVER ( PARTITION BY c2 ORDER BY c1) tile, first_value(c2) OVER ( 
> PARTITION BY c2 ORDER BY c1) firstVal_c2, last_value(c2) OVER ( PARTITION BY 
> c2 ORDER BY c1) lastVal_c2 FROM t222) sub_query where firstVal_c2 is null 
> ORDER BY tile, c1;
>    c1    | c2 | lead_c2 | lag_c2 | tile | firstval_c2 | lastval_c2 
> ---------+----+---------+--------+------+-------------+------------
>       19 |    |         |        |    1 |             | 
>    65536 |    |         |        |    1 |             | 
>  1000000 |    |         |        |    2 |             | 
>          |    |         |        |    3 |             | 
> (4 rows)
> {code}
> Table definition on Postgres
> {code}
> postgres=# \d t222
>         Table "public.t222"
>  Column |     Type     | Modifiers 
> --------+--------------+-----------
>  c1     | integer      | 
>  c2     | character(1) | 
> {code}
> Data from the parquet file used in above queries on Drill
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select * from `tblWnulls.parquet`;
> +-------------+-------+
> |     c1      |  c2   |
> +-------------+-------+
> | 1           | a     |
> | 2           | b     |
> | 13          | c     |
> | 4           | c     |
> | 5           | a     |
> | 6           | c     |
> | null        | d     |
> | 17          | b     |
> | 8           | c     |
> | 9           | b     |
> | 10          | d     |
> | 2147483647  | d     |
> | 10          | a     |
> | 11          | a     |
> | null        | c     |
> | 11          | d     |
> | 12          | c     |
> | 19          | null  |
> | 13          | b     |
> | 14          | a     |
> | 13          | c     |
> | 15          | e     |
> | -1          | e     |
> | 0           | a     |
> | 2147483647  | d     |
> | null        | d     |
> | 65536       | null  |
> | 1000000     | null  |
> | null        | null  |
> | 11111       | a     |
> +-------------+-------+
> 30 rows selected (0.145 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to