Omega359 commented on PR #22245:
URL: https://github.com/apache/datafusion/pull/22245#issuecomment-4526310021
There is a correctness issue: the fast path reuses the first concrete
input’s list null bitmap as the output null bitmap at arrays_zip.rs. The slow
path marks an output row null only when all concrete input lists are null. The
fast path currently marks it according to the first input.
That changes semantics when inputs are perfectly aligned but have different
list-level nulls.
```sql
select arrays_zip(a,b) as zipped
from (values
(null::int[], []::int[]),
([]::int[], null::int[]),
(null::int[], null::int[])
) as t(a,b);
```
upstream/main returns:
```
[]
[]
NULL
```
This PR returns:
```
NULL
[]
NULL
```
I would suggest adding this to the arrays_zip.slt file
```
# mixed NULL list and non-null empty list: result row is non-null empty
query ?
select arrays_zip(a, b) from (values
(null::int[], []::int[]),
([]::int[], null::int[]),
(null::int[], null::int[])
) as t(a, b);
----
[]
[]
NULL
```
Attached is a benchmark update to test the performance of this change.
[arrays_zip_benchmark_main.patch](https://github.com/user-attachments/files/28181968/arrays_zip_benchmark_main.patch)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]