[
https://issues.apache.org/jira/browse/HIVE-25909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567331#comment-17567331
]
Alessandro Solimando commented on HIVE-25909:
---------------------------------------------
For the record, I could not find what the SQL standard dictates for the default
of ORDER BY in windowing functions regarding NULLs, so I have compared against
the major RDBMs and found that we Hive is aligned with every of them (apart
from MySQL), my findings as follows:
{noformat}
SELECT username, action, amount, row_number() OVER (PARTITION BY username,
action ORDER BY action DESC, amount DESC)
FROM event;
Oracle 11g R2:
john buy (null) 1
john buy 39 2
john buy 25 3
john sell 20 1
john sell 3 2
MySQL 8.0:
john buy 39 1
john buy 25 2
john buy null 3
john sell 20 1
john sell 3 2
Postgres 13:
john sell 20 1
john sell 3 2
john buy null 1
john buy 39 2
john buy 25 3
Hive:
john sell 20 1
john sell 3 2
john buy NULL 1
john buy 39 2
john buy 25 3
{noformat}
{noformat}
SELECT username, action, amount, row_number() OVER (PARTITION BY username,
action ORDER BY action DESC, amount DESC NULLS LAST)
FROM event;
Oracle 11g R2:
john buy 39 1
john buy 25 2
john buy (null) 3
john sell 20 1
john sell 3 2
MySQL 8.0: it does not support "NULLS LAST" syntax
Postgres 13:
john sell 20 1
john sell 3 2
john buy 39 1
john buy 25 2
john buy null 3
Hive:
john sell 20 1
john sell 3 2
john buy 39 1
john buy 25 2
john buy NULL 3
{noformat}
{noformat}
SELECT username, action, amount, row_number() OVER (PARTITION BY username,
action ORDER BY action DESC, amount DESC NULLS FIRST)
FROM event;
Oracle 11g R2:
john buy (null) 1
john buy 39 2
john buy 25 3
john sell 20 1
john sell 3 2
MySQL 8.0: it does not support "NULLS FIRST" syntax
Postgres 13:
john sell 20 1
john sell 3 2
john buy null 1
john buy 39 2
john buy 25 3
Hive:
john sell 20 1
john sell 3 2
john buy NULL 1
john buy 39 2
john buy 25 3
{noformat}
> Add test for 'hive.default.nulls.last' property for windows with ordering
> -------------------------------------------------------------------------
>
> Key: HIVE-25909
> URL: https://issues.apache.org/jira/browse/HIVE-25909
> Project: Hive
> Issue Type: Test
> Components: CBO
> Affects Versions: 4.0.0
> Reporter: Alessandro Solimando
> Assignee: Alessandro Solimando
> Priority: Minor
> Labels: pull-request-available
> Fix For: 4.0.0, 4.0.0-alpha-1
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> Add a test around "hive.default.nulls.last" configuration property and its
> interaction with order by clauses within windows.
> The property is known to respect such properties:
>
> ||hive.default.nulls.last||ASC||DESC||
> |true|NULL LAST|NULL FIRST|
> |false|NULL FIRST|NULL LAST|
>
> The test can be based along the line of the following examples:
> {noformat}
> -- hive.default.nulls.last is true by default, it sets NULLS_FIRST for DESC
> set hive.default.nulls.last;
> OUT:
> hive.default.nulls.last=true
> SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC)
> FROM test1;
> OUT:
> John Doe 1990-05-10 00:00:00 2022-01-10 00:00:00 1
> John Doe 1990-05-10 00:00:00 2021-12-10 00:00:00 2
> John Doe 1990-05-10 00:00:00 2021-11-10 00:00:00 3
> John Doe 1990-05-10 00:00:00 2021-10-10 00:00:00 4
> John Doe 1990-05-10 00:00:00 2021-09-10 00:00:00 5
> John Doe 1987-05-10 00:00:00 NULL 1
> John Doe 1987-05-10 00:00:00 2022-01-10 00:00:00 2
> John Doe 1987-05-10 00:00:00 2021-12-10 00:00:00 3
> John Doe 1987-05-10 00:00:00 2021-11-10 00:00:00 4
> John Doe 1987-05-10 00:00:00 2021-10-10 00:00:00 5
> -- we set hive.default.nulls.last=false, it sets NULLS_LAST for DESC
> set hive.default.nulls.last=false;
> SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC)
> FROM test1;
> OUT:
> John Doe 1990-05-10 00:00:00 2022-01-10 00:00:00 1
> John Doe 1990-05-10 00:00:00 2021-12-10 00:00:00 2
> John Doe 1990-05-10 00:00:00 2021-11-10 00:00:00 3
> John Doe 1990-05-10 00:00:00 2021-10-10 00:00:00 4
> John Doe 1990-05-10 00:00:00 2021-09-10 00:00:00 5
> John Doe 1987-05-10 00:00:00 2022-01-10 00:00:00 1
> John Doe 1987-05-10 00:00:00 2021-12-10 00:00:00 2
> John Doe 1987-05-10 00:00:00 2021-11-10 00:00:00 3
> John Doe 1987-05-10 00:00:00 2021-10-10 00:00:00 4
> John Doe 1987-05-10 00:00:00 NULL 5
> -- we set hive.default.nulls.last=false but we have explicit NULLS_LAST, we
> expect NULLS_LAST
> set hive.default.nulls.last=false;
> SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC
> NULLS LAST)
> FROM test1;
> OUT:
> John Doe 1990-05-10 00:00:00 2022-01-10 00:00:00 1
> John Doe 1990-05-10 00:00:00 2021-12-10 00:00:00 2
> John Doe 1990-05-10 00:00:00 2021-11-10 00:00:00 3
> John Doe 1990-05-10 00:00:00 2021-10-10 00:00:00 4
> John Doe 1990-05-10 00:00:00 2021-09-10 00:00:00 5
> John Doe 1987-05-10 00:00:00 2022-01-10 00:00:00 1
> John Doe 1987-05-10 00:00:00 2021-12-10 00:00:00 2
> John Doe 1987-05-10 00:00:00 2021-11-10 00:00:00 3
> John Doe 1987-05-10 00:00:00 2021-10-10 00:00:00 4
> John Doe 1987-05-10 00:00:00 NULL 5
> -- we have explicit NULLS_FIRST, we expect NULLS_FIRST
> SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC
> NULLS FIRST)
> FROM test1;
> --OUT:
> John Doe 1990-05-10 00:00:00 2022-01-10 00:00:00 1
> John Doe 1990-05-10 00:00:00 2021-12-10 00:00:00 2
> John Doe 1990-05-10 00:00:00 2021-11-10 00:00:00 3
> John Doe 1990-05-10 00:00:00 2021-10-10 00:00:00 4
> John Doe 1990-05-10 00:00:00 2021-09-10 00:00:00 5
> John Doe 1987-05-10 00:00:00 NULL 1
> John Doe 1987-05-10 00:00:00 2022-01-10 00:00:00 2
> John Doe 1987-05-10 00:00:00 2021-12-10 00:00:00 3
> John Doe 1987-05-10 00:00:00 2021-11-10 00:00:00 4
> John Doe 1987-05-10 00:00:00 2021-10-10 00:00:00 5{noformat}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)