andygrove opened a new issue, #4617: URL: https://github.com/apache/datafusion-comet/issues/4617
## What is the problem the feature request solves? Spark's array and map higher-order (lambda) functions currently have no Comet implementation, so any query using them falls back to Spark for the enclosing operator: - array: `transform`, `exists`, `forall`, `aggregate`/`reduce`, `array_sort` (with comparator), `zip_with` - map: `map_filter`, `transform_keys`, `transform_values`, `map_zip_with` These are hard to implement natively in Rust because they evaluate an arbitrary user lambda per element. ## Describe the potential solution The codegen dispatcher added for the regex/json families already admits `CodegenFallback` expressions, which includes all higher-order functions: `CometBatchKernelCodegen.canHandle` accepts them, and `CometCodegenHOFSuite` already proves `transform`/`filter`/`aggregate`/`exists` evaluate correctly inside the kernel when nested in a registered `ScalaUDF`. Wiring each HOF into the serde as a `CometCodegenDispatch` makes a top-level HOF projection stay native (running Spark's own per-element evaluation inside the Comet kernel) and match Spark exactly, falling back cleanly when the dispatcher is disabled. ## Additional context Identified while reviewing the codegen-dispatch work in #4538. Related testing-convention follow-up: #4616. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
