[
https://issues.apache.org/jira/browse/DRILL-4802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391577#comment-15391577
]
Khurram Faraaz commented on DRILL-4802:
---------------------------------------
I did some more investigation and here is what I have found.
While we do have tests that cover nulls first in our window function tests, we
do not have a test similar to the one
shown in case (ii) below. So this seems to be a general window functions issue
and is not specific to nested aggregates.
Schema details of parquet file used in below tests
{noformat}
[root@centos-01 parquet-tools]# ./parquet-schema ~/tblWnulls.parquet
message root {
optional int32 c1;
optional binary c2 (UTF8);
}
{noformat}
case (i) Order by c1 nulls first, returns correct results. Note that nulls are
first in column c1 for each group in c2.
{noformat}
0: jdbc:drill:schema=dfs.tmp> select AVG(c1) OVER(partition by c2 order by c1
nulls first), c1, c2 FROM `tblWnulls.parquet`;
+--------------------+-------------+-------+
| EXPR$0 | c1 | c2 |
+--------------------+-------------+-------+
| 0.0 | 0 | a |
| 0.5 | 1 | a |
| 2.0 | 5 | a |
| 4.0 | 10 | a |
| 5.4 | 11 | a |
| 6.833333333333333 | 14 | a |
| 1593.142857142857 | 11111 | a |
| 2.0 | 2 | b |
| 5.5 | 9 | b |
| 8.0 | 13 | b |
| 10.25 | 17 | b |
| null | null | c |
| 4.0 | 4 | c |
| 5.0 | 6 | c |
| 6.0 | 8 | c |
| 7.5 | 12 | c |
| 9.333333333333334 | 13 | c |
| 9.333333333333334 | 13 | c |
| null | null | d |
| null | null | d |
| 10.0 | 10 | d |
| 10.5 | 11 | d |
| 1.07374182875E9 | 2147483647 | d |
| 1.07374182875E9 | 2147483647 | d |
| -1.0 | -1 | e |
| 7.0 | 15 | e |
| null | null | null |
| 19.0 | 19 | null |
| 32777.5 | 65536 | null |
| 355185.0 | 1000000 | null |
+--------------------+-------------+-------+
30 rows selected (0.145 seconds)
{noformat}
case (2) order by c2 nulls first, does not return correct results. Note that
nulls are NOT first in column c1 for each group in c2.
{noformat}
0: jdbc:drill:schema=dfs.tmp> select AVG(c1) OVER(partition by c2 order by c2
nulls first), c1, c2 FROM `tblWnulls.parquet`;
+--------------------+-------------+-------+
| EXPR$0 | c1 | c2 |
+--------------------+-------------+-------+
| 1593.142857142857 | 11111 | a |
| 1593.142857142857 | 5 | a |
| 1593.142857142857 | 10 | a |
| 1593.142857142857 | 11 | a |
| 1593.142857142857 | 1 | a |
| 1593.142857142857 | 14 | a |
| 1593.142857142857 | 0 | a |
| 10.25 | 17 | b |
| 10.25 | 9 | b |
| 10.25 | 13 | b |
| 10.25 | 2 | b |
| 9.333333333333334 | 6 | c |
| 9.333333333333334 | 13 | c |
| 9.333333333333334 | 8 | c |
| 9.333333333333334 | null | c |
| 9.333333333333334 | 4 | c |
| 9.333333333333334 | 12 | c |
| 9.333333333333334 | 13 | c |
| 1.07374182875E9 | 2147483647 | d |
| 1.07374182875E9 | null | d |
| 1.07374182875E9 | 11 | d |
| 1.07374182875E9 | null | d |
| 1.07374182875E9 | 2147483647 | d |
| 1.07374182875E9 | 10 | d |
| 7.0 | 15 | e |
| 7.0 | -1 | e |
| 355185.0 | 65536 | null |
| 355185.0 | 19 | null |
| 355185.0 | null | null |
| 355185.0 | 1000000 | null |
+--------------------+-------------+-------+
30 rows selected (0.195 seconds)
{noformat}
> NULLS are not first when NULLS FIRST is used with ORDER BY in window
> definition
> -------------------------------------------------------------------------------
>
> Key: DRILL-4802
> URL: https://issues.apache.org/jira/browse/DRILL-4802
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Flow
> Affects Versions: 1.8.0
> Environment: 4 node CentOS cluster
> Reporter: Khurram Faraaz
> Assignee: Khurram Faraaz
>
> NULLS FIRST is not honored when used with ORDER BY inside window definition.
> This in a wrong results issue.
> MapR Drill 1.8.0 commit ID : 34ca63ba
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select c2, AVG(SUM(c1)) OVER(partition by c2
> order by c2 nulls first) FROM `tblWnulls.parquet` group by c2;
> +-------+----------------+
> | c2 | EXPR$1 |
> +-------+----------------+
> | a | 11152.0 |
> | b | 41.0 |
> | c | 56.0 |
> | d | 4.294967315E9 |
> | e | 14.0 |
> | null | 1065555.0 |
> +-------+----------------+
> 6 rows selected (0.227 seconds)
> {noformat}
> {noformat}
> postgres=# select c2, AVG(SUM(c1)) OVER(partition by c2 order by c2 nulls
> first) FROM t222 group by c2;
> c2 | avg
> ----+------------------------
> | 1065555.000000000000
> a | 11152.0000000000000000
> b | 41.0000000000000000
> c | 56.0000000000000000
> d | 4294967315.00000000
> e | 14.0000000000000000
> (6 rows)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)