[
https://issues.apache.org/jira/browse/DRILL-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717600#comment-14717600
]
Khurram Faraaz commented on DRILL-3668:
---------------------------------------
Verified on private branch that the below two queries give correct results. I
will verify on master once the fix is available.
commitID on private branch : ce658fe1
{code}
0: jdbc:drill:schema=dfs.tmp> SELECT * FROM (SELECT c1, c2, lead(c2) OVER (
PARTITION BY c2 ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY
c1) lag_c2, ntile(3) OVER ( PARTITION BY c2 ORDER BY c1) tile, first_value(c2)
OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2, last_value(c2) OVER (
PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM `tblWnulls.parquet`) sub_query
where firstVal_c2 = 'e' ORDER BY tile, c1;
+-----+-----+----------+---------+-------+--------------+-------------+
| c1 | c2 | lead_c2 | lag_c2 | tile | firstVal_c2 | lastVal_c2 |
+-----+-----+----------+---------+-------+--------------+-------------+
| -1 | e | e | null | 1 | e | e |
| 15 | e | null | e | 2 | e | e |
+-----+-----+----------+---------+-------+--------------+-------------+
2 rows selected (2.078 seconds)
{code}
{code}
0: jdbc:drill:schema=dfs.tmp> SELECT * FROM (SELECT c1, c2, lead(c2) OVER (
PARTITION BY c2 ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY
c1) lag_c2, ntile(3) OVER ( PARTITION BY c2 ORDER BY c1) tile, first_value(c2)
OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2, last_value(c2) OVER (
PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM `tblWnulls.parquet`) sub_query
where firstVal_c2 is null ORDER BY tile, c1;
+----------+-------+----------+---------+-------+--------------+-------------+
| c1 | c2 | lead_c2 | lag_c2 | tile | firstVal_c2 | lastVal_c2 |
+----------+-------+----------+---------+-------+--------------+-------------+
| 19 | null | null | null | 1 | null | null |
| 65536 | null | null | null | 1 | null | null |
| 1000000 | null | null | null | 2 | null | null |
| null | null | null | null | 3 | null | null |
+----------+-------+----------+---------+-------+--------------+-------------+
4 rows selected (0.499 seconds)
{code}
> Incorrect results FIRST_VALUE function
> --------------------------------------
>
> Key: DRILL-3668
> URL: https://issues.apache.org/jira/browse/DRILL-3668
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Flow
> Affects Versions: 1.2.0
> Environment: private-branch
> https://github.com/adeneche/incubator-drill/tree/new-window-funcs
> Reporter: Khurram Faraaz
> Assignee: Aman Sinha
> Priority: Critical
> Labels: window_function
> Fix For: 1.2.0
>
>
> Query returns incorrect results for first_value column on developers private
> branch.
> Query results from Drill (6 rows)
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM (SELECT c1, c2, lead(c2) OVER (
> PARTITION BY c2 ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY
> c1) lag_c2, ntile(3) OVER ( PARTITION BY c2 ORDER BY c1) tile,
> first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2,
> last_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM
> `tblWnulls.parquet`) sub_query where firstVal_c2 = 'e' ORDER BY tile, c1;
> +----------+-------+----------+---------+-------+--------------+-------------+
> | c1 | c2 | lead_c2 | lag_c2 | tile | firstVal_c2 | lastVal_c2 |
> +----------+-------+----------+---------+-------+--------------+-------------+
> | -1 | e | e | null | 1 | e | e |
> | 19 | null | null | null | 1 | e | null |
> | 65536 | null | null | null | 1 | e | null |
> | 15 | e | null | e | 2 | e | e |
> | 1000000 | null | null | null | 2 | e | null |
> | null | null | null | null | 3 | e | null |
> +----------+-------+----------+---------+-------+--------------+-------------+
> 6 rows selected (0.269 seconds)
> {code}
> Query results from Postgres for same input data (2 rows returned)
> {code}
> postgres=# SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION BY c2
> ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2,
> ntile(3) OVER ( PARTITION BY c2 ORDER BY c1) tile, first_value(c2) OVER (
> PARTITION BY c2 ORDER BY c1) firstVal_c2, last_value(c2) OVER ( PARTITION BY
> c2 ORDER BY c1) lastVal_c2 FROM t222) sub_query where firstVal_c2 = 'e' ORDER
> BY tile, c1;
> c1 | c2 | lead_c2 | lag_c2 | tile | firstval_c2 | lastval_c2
> ----+----+---------+--------+------+-------------+------------
> -1 | e | e | | 1 | e | e
> 15 | e | | e | 2 | e | e
> (2 rows)
> {code}
> Another query that returns different results
> Results returned by Drill - zero rows returned
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM (SELECT c1, c2, lead(c2) OVER (
> PARTITION BY c2 ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY
> c1) lag_c2, ntile(3) OVER ( PARTITION BY c2 ORDER BY c1) tile,
> first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2,
> last_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM
> `tblWnulls.parquet`) sub_query where firstVal_c2 is null ORDER BY tile, c1;
> +-----+-----+----------+---------+-------+--------------+-------------+
> | c1 | c2 | lead_c2 | lag_c2 | tile | firstVal_c2 | lastVal_c2 |
> +-----+-----+----------+---------+-------+--------------+-------------+
> +-----+-----+----------+---------+-------+--------------+-------------+
> No rows selected (0.279 seconds)
> {code}
> Results returned by Postgres for same input data, 4 rows returned
> {code}
> postgres=# SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION BY c2
> ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2,
> ntile(3) OVER ( PARTITION BY c2 ORDER BY c1) tile, first_value(c2) OVER (
> PARTITION BY c2 ORDER BY c1) firstVal_c2, last_value(c2) OVER ( PARTITION BY
> c2 ORDER BY c1) lastVal_c2 FROM t222) sub_query where firstVal_c2 is null
> ORDER BY tile, c1;
> c1 | c2 | lead_c2 | lag_c2 | tile | firstval_c2 | lastval_c2
> ---------+----+---------+--------+------+-------------+------------
> 19 | | | | 1 | |
> 65536 | | | | 1 | |
> 1000000 | | | | 2 | |
> | | | | 3 | |
> (4 rows)
> {code}
> Table definition on Postgres
> {code}
> postgres=# \d t222
> Table "public.t222"
> Column | Type | Modifiers
> --------+--------------+-----------
> c1 | integer |
> c2 | character(1) |
> {code}
> Data from the parquet file used in above queries on Drill
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select * from `tblWnulls.parquet`;
> +-------------+-------+
> | c1 | c2 |
> +-------------+-------+
> | 1 | a |
> | 2 | b |
> | 13 | c |
> | 4 | c |
> | 5 | a |
> | 6 | c |
> | null | d |
> | 17 | b |
> | 8 | c |
> | 9 | b |
> | 10 | d |
> | 2147483647 | d |
> | 10 | a |
> | 11 | a |
> | null | c |
> | 11 | d |
> | 12 | c |
> | 19 | null |
> | 13 | b |
> | 14 | a |
> | 13 | c |
> | 15 | e |
> | -1 | e |
> | 0 | a |
> | 2147483647 | d |
> | null | d |
> | 65536 | null |
> | 1000000 | null |
> | null | null |
> | 11111 | a |
> +-------------+-------+
> 30 rows selected (0.145 seconds)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)