vbarua opened a new issue, #15006:
URL: https://github.com/apache/datafusion/issues/15006

   ### Describe the bug
   
   ## Background
   
   In the SQL standard, RESPECT NULLS and IGNORE NULLS are options to be set 
for the `lead`, `lag`, `first_value`, `last_value` and `nth_value` _window_ 
functions.
   
   > Some window functions permit a null_treatment clause that specifies how to 
handle NULL values when calculating results. This clause is optional. It is 
part of the SQL standard, but the MySQL implementation permits only RESPECT 
NULLS (which is also the default).
   
   
[MySQL](https://dev.mysql.com/doc/refman/9.2/en/window-function-descriptions.html)
   
   > The SQL standard defines a RESPECT NULLS or IGNORE NULLS option for lead, 
lag, first_value, last_value, and nth_value. This is not implemented in 
PostgreSQL: the behavior is always the same as the standard's default, namely 
RESPECT NULLS. Likewise, the standard's FROM FIRST or FROM LAST option for 
nth_value is not implemented: only the default FROM FIRST behavior is 
supported. (You can achieve the result of FROM LAST by reversing the ORDER BY 
ordering.)
   
   [Postgres](https://www.postgresql.org/docs/17/functions-window.html)
   
   While MySQL and Postgres don't support the IGNORE NULLS options, there are 
systems that support it for those functions:
   * [Trino](https://trino.io/docs/current/functions/window.html)
   * 
[SqlServer](https://learn.microsoft.com/en-us/sql/t-sql/functions/lag-transact-sql?view=sql-server-ver16#-ignore-nulls--respect-nulls-)
   * [Snowflake](https://docs.snowflake.com/en/sql-reference/functions/lag)
   
   ## Bug
   
   DataFusion permits queries like
   ```sql
   SELECT FIRST_VALUE(column1) RESPECT NULLS FROM t; 
   ```
   and
   ```sql
   SELECT FIRST_VALUE(column1) IGNORE NULLS FROM t; 
   ```
   and in fact has 
[tests](https://github.com/apache/datafusion/blob/dd0fd889ea603f929accb99002e2f99280823f5c/datafusion/sqllogictest/test_files/aggregate.slt#L5866-L5874)
 for them.
   
   This doesn't conform with how most engines work with these options.
   
   A more "standard" version of these IGNORE NULLS would look like
   ```sql
   SELECT FIRST_VALUE(column1) FILTER (WHERE column1 IS NOT NULL) FROM t; 
   ```
   
   That being said, part of the weirdness here is that DataFusion defines 
`first_value` both as an [aggregate 
function](https://github.com/apache/datafusion/blob/main/datafusion/functions-aggregate/src/first_last.rs)
 and as a [window 
function](https://github.com/apache/datafusion/blob/dd0fd889ea603f929accb99002e2f99280823f5c/datafusion/functions-window/src/nth_value.rs#L41-L46).
   
   Other engines treat `first_value` solely as a window function, and provide 
something like `any_value` 
([Trino](https://trino.io/docs/current/functions/aggregate.html#any_value), 
[Postgres](https://www.postgresql.org/docs/17/functions-aggregate.html), 
[Snowflake](https://docs.snowflake.com/en/sql-reference/functions/any_value)).
   
   ### To Reproduce
   
   See tests for RESPECT NULLS and IGNORE NULLS in the codebase
   
https://github.com/apache/datafusion/blob/dd0fd889ea603f929accb99002e2f99280823f5c/datafusion/sqllogictest/test_files/aggregate.slt#L5866-L5874
   
   
   ### Expected behavior
   
   To better conform with standard SQL behaviour, at the SQL level: 
   1. `RESPECT NULLS` and `IGNORE NULLS` should not be valid options for 
_aggregate_ functions.
   2. `RESPECT NULLS` and `IGNORE NULLS` should be _only_ valid for the 
following _window_ functions: `lead`, `lag`, `first_value`, `last_value`, 
`nth_value`
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to