andygrove commented on code in PR #3356:
URL: https://github.com/apache/datafusion-comet/pull/3356#discussion_r2827043744


##########
spark/src/test/resources/sql-tests/expressions/map/map_from_arrays.sql:
##########
@@ -26,9 +26,7 @@ INSERT INTO test_map_from_arrays VALUES (array('a', 'b', 
'c'), array(1, 2, 3)),
 query spark_answer_only
 SELECT map_from_arrays(k, v) FROM test_map_from_arrays WHERE k IS NOT NULL
 
--- Comet bug: map_from_arrays(NULL, NULL) causes native crash "map key cannot 
be null"
--- https://github.com/apache/datafusion-comet/issues/3327
-query ignore(https://github.com/apache/datafusion-comet/issues/3327)
+query spark_answer_only
 SELECT map_from_arrays(k, v) FROM test_map_from_arrays WHERE k IS NULL

Review Comment:
   I ran some additional tests locally and found that the null check needs to 
cover both inputs, not just the keys. The Spark docs say `map_from_arrays` 
"returns null if either the keys array or values array is null", but the 
current `keyIsNotNullExpr` only checks `expr.left`.
   
   I added the row `(array('x'), NULL)` to the test data and a query `SELECT 
map_from_arrays(k, v) FROM test_map_from_arrays WHERE k IS NOT NULL AND v IS 
NULL`, which crashes with `CometNativeException: Compute error: concat requires 
input of at least one array`.
   
   Here's a suggested updated test file that covers the missing cases. Note 
that I did use AI to generate these tests, so it is possible that there are 
mistakes.
   
   ```sql
   -- ConfigMatrix: parquet.enable.dictionary=false,true
   
   statement
   CREATE TABLE test_map_from_arrays(k array<string>, v array<int>) USING 
parquet
   
   statement
   INSERT INTO test_map_from_arrays VALUES
     (array('a', 'b', 'c'), array(1, 2, 3)),
     (array(), array()),
     (NULL, NULL),
     (array('x'), NULL),
     (NULL, array(99))
   
   -- basic functionality
   query spark_answer_only
   SELECT map_from_arrays(k, v) FROM test_map_from_arrays WHERE k IS NOT NULL 
AND v IS NOT NULL
   
   -- both inputs NULL should return NULL
   query
   SELECT map_from_arrays(k, v) FROM test_map_from_arrays WHERE k IS NULL AND v 
IS NULL
   
   -- keys not null but values null should return NULL (Spark behavior)
   query
   SELECT map_from_arrays(k, v) FROM test_map_from_arrays WHERE k IS NOT NULL 
AND v IS NULL
   
   -- keys null but values not null should return NULL (Spark behavior)
   query
   SELECT map_from_arrays(k, v) FROM test_map_from_arrays WHERE k IS NULL AND v 
IS NOT NULL
   
   -- all rows including nulls
   query spark_answer_only
   SELECT map_from_arrays(k, v) FROM test_map_from_arrays
   
   -- literal arguments
   query spark_answer_only
   SELECT map_from_arrays(array('a', 'b'), array(1, 2))
   
   -- literal null arguments
   query
   SELECT map_from_arrays(NULL, array(1, 2))
   
   query
   SELECT map_from_arrays(array('a'), NULL)
   
   query
   SELECT map_from_arrays(NULL, NULL)
   ```
   
   To fix the serde, the condition should check both inputs for null. Something 
like `IsNotNull(expr.left) AND IsNotNull(expr.right)` instead of just 
`IsNotNull(expr.left)`.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to