andygrove opened a new issue, #4680:
URL: https://github.com/apache/datafusion-comet/issues/4680
### Describe the bug
`MapFromArrays` and `MapFromEntries` both route through Spark's
`ArrayBasedMapBuilder`, which:
1. Throws `RuntimeException("Cannot use null as map key")` when a key is
NULL.
2. Applies `spark.sql.mapKeyDedupPolicy` (`EXCEPTION` vs `LAST_WIN`) for
duplicate keys.
Comet does not enforce either behavior:
- `CometMapFromArrays`
(`spark/src/main/scala/org/apache/comet/serde/maps.scala`) only wraps the call
in a CASE WHEN that handles whole-array NULLs. It does not detect a NULL
element inside the keys array, and it does not implement either dedup policy.
- `CometMapFromEntries` (`maps.scala`) only gates on `BinaryType` keys /
values. The null-key and duplicate-key cases are unmarked.
For datasets containing a NULL key or duplicate keys, Comet will silently
produce a map where Spark would throw, or apply different dedup semantics.
### Steps to reproduce
Build a map via `map_from_arrays` or `map_from_entries` where the keys array
contains a NULL element, or contains duplicate keys, and compare Comet against
Spark with `spark.sql.mapKeyDedupPolicy` set to both `EXCEPTION` and `LAST_WIN`.
### Expected behavior
`CometMapFromArrays` and `CometMapFromEntries` should declare an
`Incompatible(Some(...))` branch (or a tighter input check) covering null-key
rejection and dedup-policy semantics, with matching entries in
`getIncompatibleReasons()`, so the cases fall back to Spark rather than
diverging silently.
### Additional context
Split out from #4505 (items 2 and 3), surfaced by the
`audit-comet-expression` skill run in #4478. The two expressions share the
`ArrayBasedMapBuilder` semantics so they are tracked together here. Distinct
from #3327 (closed; native crash on whole-array NULL inputs).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]