andygrove opened a new issue, #4539: URL: https://github.com/apache/datafusion-comet/issues/4539
## Describe the bug Routing a map-typed expression through the JVM codegen dispatcher (`CometScalaUDF.emitJvmCodegenDispatch` / `CometCodegenDispatch`) produces incorrect results: map keys (and likely values) are corrupted in the output `MapType` array. `CometBatchKernelCodegen.canHandle` accepts `MapType` (`isSupportedDataType` returns true for maps), so the dispatcher emits a kernel for the expression, but the kernel does not marshal map output correctly back through Arrow FFI. ## To Reproduce Routing `map_concat` through `CometCodegenDispatch` and running: ```sql SELECT map_concat(map(1, 'a', 2, 'b'), map(3, 'c')) ``` produces: ``` Spark : Map(1 -> a, 2 -> b, 3 -> c) Comet : Map(1 -> a, 2 -> b, 0 -> c) -- key 3 corrupted to 0 ``` The query is executed natively (a `CometProject` is produced, no fallback), so this is a wrong-answer bug, not a fallback. A map built from a column (e.g. `map_concat(map(id, 'a'), map(id + 10, 'b'))`) happened to come out correct in the same test, so the corruption appears tied to how literal / certain map entries are marshaled. ## Expected behavior The codegen dispatcher should either evaluate map-typed output identically to Spark, or `canHandle` should reject `MapType` output so such expressions fall back cleanly instead of returning wrong results. ## Impact This blocks routing any map-output expression through the dispatcher. Affected expressions (currently kept on the fallback / native path to avoid the bug): - `map_concat` - `map` / `create_map` - `map_from_entries` (the `Incompatible` `BinaryType` key/value branch) In #4538 these opt out of codegen dispatch (`allowIncompatCodegenDispatch = false` for `map_from_entries`; `map_concat` / `create_map` are not registered) so the bug is not currently user-visible, but it prevents arrow-native-via-dispatch coverage for map functions and is a latent correctness hazard for any future map-output dispatch. ## Additional context - Discovered while implementing codegen-dispatch coverage in #4538 (part of #4506). - Relevant code: `spark/src/main/scala/org/apache/comet/codegen/CometBatchKernelCodegen.scala` (`isSupportedDataType` / `canHandle`) and the `JvmScalarUdf` execution path in native code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
