[ 
https://issues.apache.org/jira/browse/SPARK-55256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-55256.
------------------------------
    Fix Version/s: 4.2.0
       Resolution: Fixed

Issue resolved by pull request 54034
[https://github.com/apache/spark/pull/54034]

> [SQL] Support IGNORE NULLS / RESPECT NULLS for array_agg and collect_list
> -------------------------------------------------------------------------
>
>                 Key: SPARK-55256
>                 URL: https://issues.apache.org/jira/browse/SPARK-55256
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.2.0
>            Reporter: Kent Yao
>            Assignee: Kent Yao
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.2.0
>
>
> This PR adds support for the IGNORE NULLS and RESPECT NULLS clauses for 
> array_agg and collect_list aggregate functions.
> The SQL standard and many databases (PostgreSQL, Snowflake, DuckDB, etc.) 
> support the IGNORE NULLS / RESPECT NULLS syntax for aggregate functions. 
> Currently, Spark only supports this syntax for window functions like first, 
> last, lead, lag, and nth_value.
> By adding this support to array_agg and collect_list, users can explicitly 
> control whether null values should be included in the resulting array:
> - array_agg(col) IGNORE NULLS - skips null values (default behavior)
> - array_agg(col) RESPECT NULLS - includes null values in the result
> Implementation Details:
> 1. Added ignoreNulls: Boolean = true parameter to CollectList class
> 2. array_agg now uses CollectList as they have identical behavior
> 3. Changed UnresolvedFunction.ignoreNulls from Boolean to Option[Boolean] to 
> distinguish between None (use function default), Some(true) (IGNORE NULLS), 
> Some(false) (RESPECT NULLS)
> 4. Consolidated ignoreNulls resolution logic in FunctionResolution with 
> shared resolveIgnoreNulls and applyIgnoreNulls methods
> Users can now use IGNORE NULLS / RESPECT NULLS with array_agg and 
> collect_list:
> SELECT array_agg(col IGNORE NULLS) FROM table;
> SELECT collect_list(col RESPECT NULLS) OVER (PARTITION BY id) FROM table;



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to