Kent Yao created SPARK-55256:
--------------------------------

             Summary: [SQL] Support IGNORE NULLS / RESPECT NULLS for array_agg 
and collect_list
                 Key: SPARK-55256
                 URL: https://issues.apache.org/jira/browse/SPARK-55256
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 4.2.0
            Reporter: Kent Yao


This PR adds support for the IGNORE NULLS and RESPECT NULLS clauses for 
array_agg and collect_list aggregate functions.

The SQL standard and many databases (PostgreSQL, Snowflake, DuckDB, etc.) 
support the IGNORE NULLS / RESPECT NULLS syntax for aggregate functions. 
Currently, Spark only supports this syntax for window functions like first, 
last, lead, lag, and nth_value.

By adding this support to array_agg and collect_list, users can explicitly 
control whether null values should be included in the resulting array:
- array_agg(col) IGNORE NULLS - skips null values (default behavior)
- array_agg(col) RESPECT NULLS - includes null values in the result

Implementation Details:
1. Added ignoreNulls: Boolean = true parameter to CollectList class
2. array_agg now uses CollectList as they have identical behavior
3. Changed UnresolvedFunction.ignoreNulls from Boolean to Option[Boolean] to 
distinguish between None (use function default), Some(true) (IGNORE NULLS), 
Some(false) (RESPECT NULLS)
4. Consolidated ignoreNulls resolution logic in FunctionResolution with shared 
resolveIgnoreNulls and applyIgnoreNulls methods

Users can now use IGNORE NULLS / RESPECT NULLS with array_agg and collect_list:
SELECT array_agg(col IGNORE NULLS) FROM table;
SELECT collect_list(col RESPECT NULLS) OVER (PARTITION BY id) FROM table;



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to