andygrove opened a new issue, #4680:
URL: https://github.com/apache/datafusion-comet/issues/4680

   ### Describe the bug
   
   `MapFromArrays` and `MapFromEntries` both route through Spark's 
`ArrayBasedMapBuilder`, which:
   
   1. Throws `RuntimeException("Cannot use null as map key")` when a key is 
NULL.
   2. Applies `spark.sql.mapKeyDedupPolicy` (`EXCEPTION` vs `LAST_WIN`) for 
duplicate keys.
   
   Comet does not enforce either behavior:
   
   - `CometMapFromArrays` 
(`spark/src/main/scala/org/apache/comet/serde/maps.scala`) only wraps the call 
in a CASE WHEN that handles whole-array NULLs. It does not detect a NULL 
element inside the keys array, and it does not implement either dedup policy.
   - `CometMapFromEntries` (`maps.scala`) only gates on `BinaryType` keys / 
values. The null-key and duplicate-key cases are unmarked.
   
   For datasets containing a NULL key or duplicate keys, Comet will silently 
produce a map where Spark would throw, or apply different dedup semantics.
   
   ### Steps to reproduce
   
   Build a map via `map_from_arrays` or `map_from_entries` where the keys array 
contains a NULL element, or contains duplicate keys, and compare Comet against 
Spark with `spark.sql.mapKeyDedupPolicy` set to both `EXCEPTION` and `LAST_WIN`.
   
   ### Expected behavior
   
   `CometMapFromArrays` and `CometMapFromEntries` should declare an 
`Incompatible(Some(...))` branch (or a tighter input check) covering null-key 
rejection and dedup-policy semantics, with matching entries in 
`getIncompatibleReasons()`, so the cases fall back to Spark rather than 
diverging silently.
   
   ### Additional context
   
   Split out from #4505 (items 2 and 3), surfaced by the 
`audit-comet-expression` skill run in #4478. The two expressions share the 
`ArrayBasedMapBuilder` semantics so they are tracked together here. Distinct 
from #3327 (closed; native crash on whole-array NULL inputs).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to