Khurram Faraaz created DRILL-3668:
-------------------------------------
Summary: Incorrect results FIRST_VALUE function
Key: DRILL-3668
URL: https://issues.apache.org/jira/browse/DRILL-3668
Project: Apache Drill
Issue Type: Bug
Components: Execution - Flow
Affects Versions: 1.2.0
Environment: private-branch
https://github.com/adeneche/incubator-drill/tree/new-window-funcs
Reporter: Khurram Faraaz
Assignee: Chris Westin
Query returns incorrect results for first_value column on developers private
branch.
Query results from Drill (6 rows)
{code}
0: jdbc:drill:schema=dfs.tmp> SELECT * FROM (SELECT c1, c2, lead(c2) OVER (
PARTITION BY c2 ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY
c1) lag_c2, ntile(3) OVER ( PARTITION BY c2 ORDER BY c1) tile, first_value(c2)
OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2, last_value(c2) OVER (
PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM `tblWnulls.parquet`) sub_query
where firstVal_c2 = 'e' ORDER BY tile, c1;
+----------+-------+----------+---------+-------+--------------+-------------+
| c1 | c2 | lead_c2 | lag_c2 | tile | firstVal_c2 | lastVal_c2 |
+----------+-------+----------+---------+-------+--------------+-------------+
| -1 | e | e | null | 1 | e | e |
| 19 | null | null | null | 1 | e | null |
| 65536 | null | null | null | 1 | e | null |
| 15 | e | null | e | 2 | e | e |
| 1000000 | null | null | null | 2 | e | null |
| null | null | null | null | 3 | e | null |
+----------+-------+----------+---------+-------+--------------+-------------+
6 rows selected (0.269 seconds)
{code}
Query results from Postgres for same input data (2 rows returned)
{code}
postgres=# SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION BY c2 ORDER
BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, ntile(3)
OVER ( PARTITION BY c2 ORDER BY c1) tile, first_value(c2) OVER ( PARTITION BY
c2 ORDER BY c1) firstVal_c2, last_value(c2) OVER ( PARTITION BY c2 ORDER BY c1)
lastVal_c2 FROM t222) sub_query where firstVal_c2 = 'e' ORDER BY tile, c1;
c1 | c2 | lead_c2 | lag_c2 | tile | firstval_c2 | lastval_c2
----+----+---------+--------+------+-------------+------------
-1 | e | e | | 1 | e | e
15 | e | | e | 2 | e | e
(2 rows)
{code}
Another query that returns different results
Results returned by Drill - zero rows returned
{code}
0: jdbc:drill:schema=dfs.tmp> SELECT * FROM (SELECT c1, c2, lead(c2) OVER (
PARTITION BY c2 ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY
c1) lag_c2, ntile(3) OVER ( PARTITION BY c2 ORDER BY c1) tile, first_value(c2)
OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2, last_value(c2) OVER (
PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM `tblWnulls.parquet`) sub_query
where firstVal_c2 is null ORDER BY tile, c1;
+-----+-----+----------+---------+-------+--------------+-------------+
| c1 | c2 | lead_c2 | lag_c2 | tile | firstVal_c2 | lastVal_c2 |
+-----+-----+----------+---------+-------+--------------+-------------+
+-----+-----+----------+---------+-------+--------------+-------------+
No rows selected (0.279 seconds)
{code}
Results returned by Postgres for same input data, 4 rows returned
{code}
postgres=# SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION BY c2 ORDER
BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, ntile(3)
OVER ( PARTITION BY c2 ORDER BY c1) tile, first_value(c2) OVER ( PARTITION BY
c2 ORDER BY c1) firstVal_c2, last_value(c2) OVER ( PARTITION BY c2 ORDER BY c1)
lastVal_c2 FROM t222) sub_query where firstVal_c2 is null ORDER BY tile, c1;
c1 | c2 | lead_c2 | lag_c2 | tile | firstval_c2 | lastval_c2
---------+----+---------+--------+------+-------------+------------
19 | | | | 1 | |
65536 | | | | 1 | |
1000000 | | | | 2 | |
| | | | 3 | |
(4 rows)
{code}
Table definition on Postgres
{code}
postgres=# \d t222
Table "public.t222"
Column | Type | Modifiers
--------+--------------+-----------
c1 | integer |
c2 | character(1) |
{code}
Data from the parquet file used in above queries on Drill
{code}
0: jdbc:drill:schema=dfs.tmp> select * from `tblWnulls.parquet`;
+-------------+-------+
| c1 | c2 |
+-------------+-------+
| 1 | a |
| 2 | b |
| 13 | c |
| 4 | c |
| 5 | a |
| 6 | c |
| null | d |
| 17 | b |
| 8 | c |
| 9 | b |
| 10 | d |
| 2147483647 | d |
| 10 | a |
| 11 | a |
| null | c |
| 11 | d |
| 12 | c |
| 19 | null |
| 13 | b |
| 14 | a |
| 13 | c |
| 15 | e |
| -1 | e |
| 0 | a |
| 2147483647 | d |
| null | d |
| 65536 | null |
| 1000000 | null |
| null | null |
| 11111 | a |
+-------------+-------+
30 rows selected (0.145 seconds)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)