[ 
https://issues.apache.org/jira/browse/HIVE-25909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567331#comment-17567331
 ] 

Alessandro Solimando commented on HIVE-25909:
---------------------------------------------

For the record, I could not find what the SQL standard dictates for the default 
of ORDER BY in windowing functions regarding NULLs, so I have compared against 
the major RDBMs and found that we Hive is aligned with every of them (apart 
from MySQL), my findings as follows:

{noformat}
SELECT username, action, amount, row_number() OVER (PARTITION BY username, 
action ORDER BY action DESC, amount DESC)
FROM event;

Oracle 11g R2:
john    buy     (null)  1
john    buy     39      2
john    buy     25      3
john    sell    20      1
john    sell    3       2

MySQL 8.0:
john    buy     39      1
john    buy     25      2
john    buy     null    3
john    sell    20      1
john    sell    3       2

Postgres 13:
john    sell    20      1
john    sell    3       2
john    buy     null    1
john    buy     39      2
john    buy     25      3

Hive:
john    sell    20      1
john    sell    3       2
john    buy     NULL    1
john    buy     39      2
john    buy     25      3
{noformat}

{noformat}
SELECT username, action, amount, row_number() OVER (PARTITION BY username, 
action ORDER BY action DESC, amount DESC NULLS LAST)
FROM event;

Oracle 11g R2:
john    buy     39      1
john    buy     25      2
john    buy     (null)  3
john    sell    20      1
john    sell    3       2

MySQL 8.0: it does not support "NULLS LAST" syntax

Postgres 13:
john    sell    20      1
john    sell    3       2
john    buy     39      1
john    buy     25      2
john    buy     null    3

Hive:
john    sell    20      1
john    sell    3       2
john    buy     39      1
john    buy     25      2
john    buy     NULL    3
{noformat}

{noformat}
SELECT username, action, amount, row_number() OVER (PARTITION BY username, 
action ORDER BY action DESC, amount DESC NULLS FIRST)
FROM event;

Oracle 11g R2:
john    buy     (null)  1
john    buy     39      2
john    buy     25      3
john    sell    20      1
john    sell    3       2

MySQL 8.0: it does not support "NULLS FIRST" syntax

Postgres 13:
john    sell    20      1
john    sell    3       2
john    buy     null    1
john    buy     39      2
john    buy     25      3

Hive:
john    sell    20      1
john    sell    3       2
john    buy     NULL    1
john    buy     39      2
john    buy     25      3
{noformat}


> Add test for 'hive.default.nulls.last' property for windows with ordering
> -------------------------------------------------------------------------
>
>                 Key: HIVE-25909
>                 URL: https://issues.apache.org/jira/browse/HIVE-25909
>             Project: Hive
>          Issue Type: Test
>          Components: CBO
>    Affects Versions: 4.0.0
>            Reporter: Alessandro Solimando
>            Assignee: Alessandro Solimando
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 4.0.0, 4.0.0-alpha-1
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Add a test around "hive.default.nulls.last" configuration property and its 
> interaction with order by clauses within windows.
> The property is known to respect such properties:
>  
> ||hive.default.nulls.last||ASC||DESC||
> |true|NULL LAST|NULL FIRST|
> |false|NULL FIRST|NULL LAST|
>  
> The test can be based along the line of the following examples:
> {noformat}
> -- hive.default.nulls.last is true by default, it sets NULLS_FIRST for DESC
> set hive.default.nulls.last;
> OUT:
> hive.default.nulls.last=true
> SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC)
> FROM test1;
> OUT:
> John Doe        1990-05-10 00:00:00     2022-01-10 00:00:00     1
> John Doe        1990-05-10 00:00:00     2021-12-10 00:00:00     2
> John Doe        1990-05-10 00:00:00     2021-11-10 00:00:00     3
> John Doe        1990-05-10 00:00:00     2021-10-10 00:00:00     4
> John Doe        1990-05-10 00:00:00     2021-09-10 00:00:00     5
> John Doe        1987-05-10 00:00:00     NULL    1
> John Doe        1987-05-10 00:00:00     2022-01-10 00:00:00     2
> John Doe        1987-05-10 00:00:00     2021-12-10 00:00:00     3
> John Doe        1987-05-10 00:00:00     2021-11-10 00:00:00     4
> John Doe        1987-05-10 00:00:00     2021-10-10 00:00:00     5
> -- we set hive.default.nulls.last=false, it sets NULLS_LAST for DESC
> set hive.default.nulls.last=false;
> SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC)
> FROM test1;
> OUT:
> John Doe        1990-05-10 00:00:00     2022-01-10 00:00:00     1
> John Doe        1990-05-10 00:00:00     2021-12-10 00:00:00     2
> John Doe        1990-05-10 00:00:00     2021-11-10 00:00:00     3
> John Doe        1990-05-10 00:00:00     2021-10-10 00:00:00     4
> John Doe        1990-05-10 00:00:00     2021-09-10 00:00:00     5
> John Doe        1987-05-10 00:00:00     2022-01-10 00:00:00     1
> John Doe        1987-05-10 00:00:00     2021-12-10 00:00:00     2
> John Doe        1987-05-10 00:00:00     2021-11-10 00:00:00     3
> John Doe        1987-05-10 00:00:00     2021-10-10 00:00:00     4
> John Doe        1987-05-10 00:00:00     NULL    5
> -- we set hive.default.nulls.last=false but we have explicit NULLS_LAST, we 
> expect NULLS_LAST
> set hive.default.nulls.last=false;
> SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC 
> NULLS LAST)
> FROM test1;
> OUT:
> John Doe        1990-05-10 00:00:00     2022-01-10 00:00:00     1
> John Doe        1990-05-10 00:00:00     2021-12-10 00:00:00     2
> John Doe        1990-05-10 00:00:00     2021-11-10 00:00:00     3
> John Doe        1990-05-10 00:00:00     2021-10-10 00:00:00     4
> John Doe        1990-05-10 00:00:00     2021-09-10 00:00:00     5
> John Doe        1987-05-10 00:00:00     2022-01-10 00:00:00     1
> John Doe        1987-05-10 00:00:00     2021-12-10 00:00:00     2
> John Doe        1987-05-10 00:00:00     2021-11-10 00:00:00     3
> John Doe        1987-05-10 00:00:00     2021-10-10 00:00:00     4
> John Doe        1987-05-10 00:00:00     NULL    5
> -- we have explicit NULLS_FIRST, we expect NULLS_FIRST
> SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC 
> NULLS FIRST)
> FROM test1;
> --OUT:
> John Doe        1990-05-10 00:00:00     2022-01-10 00:00:00     1
> John Doe        1990-05-10 00:00:00     2021-12-10 00:00:00     2
> John Doe        1990-05-10 00:00:00     2021-11-10 00:00:00     3
> John Doe        1990-05-10 00:00:00     2021-10-10 00:00:00     4
> John Doe        1990-05-10 00:00:00     2021-09-10 00:00:00     5
> John Doe        1987-05-10 00:00:00     NULL    1
> John Doe        1987-05-10 00:00:00     2022-01-10 00:00:00     2
> John Doe        1987-05-10 00:00:00     2021-12-10 00:00:00     3
> John Doe        1987-05-10 00:00:00     2021-11-10 00:00:00     4
> John Doe        1987-05-10 00:00:00     2021-10-10 00:00:00     5{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to