Jefffrey opened a new issue, #19682:
URL: https://github.com/apache/datafusion/issues/19682

   Current behaviour is per this issue comment: 
https://github.com/apache/datafusion/issues/8181#issuecomment-1817864491
   
   > for `array_union`:
   > ```
   > array_union([], []) = []
   > array_union([], null) = []
   > array_union(null, []) = []
   > array_union(null, null) = null
   > ```
   > 
   > for `array_except`:
   > ```
   > array_except([], []) = []
   > array_except([], null) = []
   > array_except(null, []) = null
   > array_except(null, null) = null
   > ```
   > 
   > for `array_intersect`:
   > ```
   > array_intersect([], []) = []
   > array_intersect([], null) = []
   > array_intersect(null, []) = []
   > array_intersect(null, null) = null
   > ```
   
   I find this confusing as usually with other functions we return null if 
either input is null. And recent PR:
   
   - https://github.com/apache/datafusion/pull/19415
   
   Now implements that for array inputs (for union and intersect) we return 
null in a row if either input list for the row is null (scalar null inputs 
still correspond to above behaviour as that was a separate code path).
   
   ## Spark behaviour
   
   Consistently return null if either input is null:
   
   ```python
   >>> spark.sql("select array_except(cast(null as array<int>), 
array(1))").show()
   +----------------------------+
   |array_except(NULL, array(1))|
   +----------------------------+
   |                        NULL|
   +----------------------------+
   
   >>> spark.sql("select array_except(array(1), cast(null as 
array<int>))").show()
   +----------------------------+
   |array_except(array(1), NULL)|
   +----------------------------+
   |                        NULL|
   +----------------------------+
   
   >>> spark.sql("select array_union(cast(null as array<int>), 
array(1))").show()
   +---------------------------+
   |array_union(NULL, array(1))|
   +---------------------------+
   |                       NULL|
   +---------------------------+
   
   >>> spark.sql("select array_union(array(1), cast(null as 
array<int>))").show()
   +---------------------------+
   |array_union(array(1), NULL)|
   +---------------------------+
   |                       NULL|
   +---------------------------+
   
   >>> spark.sql("select array_intersect(cast(null as array<int>), 
array(1))").show()
   +-------------------------------+
   |array_intersect(NULL, array(1))|
   +-------------------------------+
   |                           NULL|
   +-------------------------------+
   
   >>> spark.sql("select array_intersect(array(1), cast(null as 
array<int>))").show()
   +-------------------------------+
   |array_intersect(array(1), NULL)|
   +-------------------------------+
   |                           NULL|
   +-------------------------------+
   ```
   
   ## Clickhouse behaviour
   
   Consistently return null if either input is null:
   
   ```sql
   :) select arrayExcept(null, [1]);
   
   SELECT arrayExcept(NULL, [1])
   
   Query id: ca4fab63-30c0-4e45-8358-b9d23e495357
   
      ┌─arrayExcept(NULL, [1])─┐
   1. │ ᴺᵁᴸᴸ                   │
      └────────────────────────┘
   
   1 row in set. Elapsed: 0.002 sec.
   
   :) select arrayExcept([1], null);
   
   SELECT arrayExcept([1], NULL)
   
   Query id: 4770abf8-0de1-47b7-889c-0bcb400a232f
   
      ┌─arrayExcept([1], NULL)─┐
   1. │ ᴺᵁᴸᴸ                   │
      └────────────────────────┘
   
   1 row in set. Elapsed: 0.001 sec.
   
   :) select arrayUnion(null, [1]);
   
   SELECT arrayUnion(NULL, [1])
   
   Query id: e8c75415-2e0c-4a65-a7be-69dbd81f4546
   
      ┌─arrayUnion(NULL, [1])─┐
   1. │ ᴺᵁᴸᴸ                  │
      └───────────────────────┘
   
   1 row in set. Elapsed: 0.001 sec.
   
   :) select arrayUnion([1], null);
   
   SELECT arrayUnion([1], NULL)
   
   Query id: 21aabafd-8995-4a42-b7b9-4eb7e66746fd
   
      ┌─arrayUnion([1], NULL)─┐
   1. │ ᴺᵁᴸᴸ                  │
      └───────────────────────┘
   
   1 row in set. Elapsed: 0.001 sec.
   
   :) select arrayIntersect(null, [1]);
   
   SELECT arrayIntersect(NULL, [1])
   
   Query id: 3c161ed4-a244-4793-957f-925cdf6e5e0e
   
      ┌─arrayIntersect(NULL, [1])─┐
   1. │ ᴺᵁᴸᴸ                      │
      └───────────────────────────┘
   
   1 row in set. Elapsed: 0.002 sec.
   
   :) select arrayIntersect([1], null);
   
   SELECT arrayIntersect([1], NULL)
   
   Query id: 9e486d9a-6129-40ea-947f-e8f63e7957e4
   
      ┌─arrayIntersect([1], NULL)─┐
   1. │ ᴺᵁᴸᴸ                      │
      └───────────────────────────┘
   
   1 row in set. Elapsed: 0.002 sec.
   ```
   
   ## DuckDB behaviour
   
   Only supports array_intersect, and nullability depends on first argument:
   
   ```sql
   D select array_intersect(null, [1]);
   ┌───────────────────────────────────────────┐
   │ array_intersect(NULL, main.list_value(1)) │
   │                   int32                   │
   ├───────────────────────────────────────────┤
   │                   NULL                    │
   └───────────────────────────────────────────┘
   D select array_intersect([1], null);
   ┌───────────────────────────────────────────┐
   │ array_intersect(main.list_value(1), NULL) │
   │                  int32[]                  │
   ├───────────────────────────────────────────┤
   │ []                                        │
   └───────────────────────────────────────────┘
   ```
   
   I believe with Spark & Clickhouse always returning null, and considering for 
most other functions we would return null, I think we should change the array 
union/intersect/except functions to return null if either input is null, 
whether that input is via a scalar null (with datatype null) or array input 
where the row being processed has a null value.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to