andygrove opened a new issue, #4539:
URL: https://github.com/apache/datafusion-comet/issues/4539

   ## Describe the bug
   
   Routing a map-typed expression through the JVM codegen dispatcher 
(`CometScalaUDF.emitJvmCodegenDispatch` / `CometCodegenDispatch`) produces 
incorrect results: map keys (and likely values) are corrupted in the output 
`MapType` array.
   
   `CometBatchKernelCodegen.canHandle` accepts `MapType` (`isSupportedDataType` 
returns true for maps), so the dispatcher emits a kernel for the expression, 
but the kernel does not marshal map output correctly back through Arrow FFI.
   
   ## To Reproduce
   
   Routing `map_concat` through `CometCodegenDispatch` and running:
   
   ```sql
   SELECT map_concat(map(1, 'a', 2, 'b'), map(3, 'c'))
   ```
   
   produces:
   
   ```
   Spark : Map(1 -> a, 2 -> b, 3 -> c)
   Comet : Map(1 -> a, 2 -> b, 0 -> c)     -- key 3 corrupted to 0
   ```
   
   The query is executed natively (a `CometProject` is produced, no fallback), 
so this is a wrong-answer bug, not a fallback. A map built from a column (e.g. 
`map_concat(map(id, 'a'), map(id + 10, 'b'))`) happened to come out correct in 
the same test, so the corruption appears tied to how literal / certain map 
entries are marshaled.
   
   ## Expected behavior
   
   The codegen dispatcher should either evaluate map-typed output identically 
to Spark, or `canHandle` should reject `MapType` output so such expressions 
fall back cleanly instead of returning wrong results.
   
   ## Impact
   
   This blocks routing any map-output expression through the dispatcher. 
Affected expressions (currently kept on the fallback / native path to avoid the 
bug):
   
   - `map_concat`
   - `map` / `create_map`
   - `map_from_entries` (the `Incompatible` `BinaryType` key/value branch)
   
   In #4538 these opt out of codegen dispatch (`allowIncompatCodegenDispatch = 
false` for `map_from_entries`; `map_concat` / `create_map` are not registered) 
so the bug is not currently user-visible, but it prevents 
arrow-native-via-dispatch coverage for map functions and is a latent 
correctness hazard for any future map-output dispatch.
   
   ## Additional context
   
   - Discovered while implementing codegen-dispatch coverage in #4538 (part of 
#4506).
   - Relevant code: 
`spark/src/main/scala/org/apache/comet/codegen/CometBatchKernelCodegen.scala` 
(`isSupportedDataType` / `canHandle`) and the `JvmScalarUdf` execution path in 
native code.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to