andygrove commented on code in PR #3356:
URL: https://github.com/apache/datafusion-comet/pull/3356#discussion_r2827043744
##########
spark/src/test/resources/sql-tests/expressions/map/map_from_arrays.sql:
##########
@@ -26,9 +26,7 @@ INSERT INTO test_map_from_arrays VALUES (array('a', 'b',
'c'), array(1, 2, 3)),
query spark_answer_only
SELECT map_from_arrays(k, v) FROM test_map_from_arrays WHERE k IS NOT NULL
--- Comet bug: map_from_arrays(NULL, NULL) causes native crash "map key cannot
be null"
--- https://github.com/apache/datafusion-comet/issues/3327
-query ignore(https://github.com/apache/datafusion-comet/issues/3327)
+query spark_answer_only
SELECT map_from_arrays(k, v) FROM test_map_from_arrays WHERE k IS NULL
Review Comment:
I ran some additional tests locally and found that the null check needs to
cover both inputs, not just the keys. The Spark docs say `map_from_arrays`
"returns null if either the keys array or values array is null", but the
current `keyIsNotNullExpr` only checks `expr.left`.
I added the row `(array('x'), NULL)` to the test data and a query `SELECT
map_from_arrays(k, v) FROM test_map_from_arrays WHERE k IS NOT NULL AND v IS
NULL`, which crashes with `CometNativeException: Compute error: concat requires
input of at least one array`.
Here's a suggested updated test file that covers the missing cases. Note
that I did use AI to generate these tests, so it is possible that there are
mistakes.
```sql
-- ConfigMatrix: parquet.enable.dictionary=false,true
statement
CREATE TABLE test_map_from_arrays(k array<string>, v array<int>) USING
parquet
statement
INSERT INTO test_map_from_arrays VALUES
(array('a', 'b', 'c'), array(1, 2, 3)),
(array(), array()),
(NULL, NULL),
(array('x'), NULL),
(NULL, array(99))
-- basic functionality
query spark_answer_only
SELECT map_from_arrays(k, v) FROM test_map_from_arrays WHERE k IS NOT NULL
AND v IS NOT NULL
-- both inputs NULL should return NULL
query
SELECT map_from_arrays(k, v) FROM test_map_from_arrays WHERE k IS NULL AND v
IS NULL
-- keys not null but values null should return NULL (Spark behavior)
query
SELECT map_from_arrays(k, v) FROM test_map_from_arrays WHERE k IS NOT NULL
AND v IS NULL
-- keys null but values not null should return NULL (Spark behavior)
query
SELECT map_from_arrays(k, v) FROM test_map_from_arrays WHERE k IS NULL AND v
IS NOT NULL
-- all rows including nulls
query spark_answer_only
SELECT map_from_arrays(k, v) FROM test_map_from_arrays
-- literal arguments
query spark_answer_only
SELECT map_from_arrays(array('a', 'b'), array(1, 2))
-- literal null arguments
query
SELECT map_from_arrays(NULL, array(1, 2))
query
SELECT map_from_arrays(array('a'), NULL)
query
SELECT map_from_arrays(NULL, NULL)
```
To fix the serde, the condition should check both inputs for null. Something
like `IsNotNull(expr.left) AND IsNotNull(expr.right)` instead of just
`IsNotNull(expr.left)`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]