andygrove opened a new pull request, #4618: URL: https://github.com/apache/datafusion-comet/pull/4618
## Which issue does this PR close? Closes #4617. ## Rationale for this change Spark's array and map higher-order (lambda) functions had no Comet implementation, so any query using them fell back to Spark for the enclosing operator. They are hard to implement natively in Rust because they evaluate an arbitrary user lambda per element. The codegen dispatcher already admits `CodegenFallback` expressions, which includes all higher-order functions: `CometBatchKernelCodegen.canHandle` accepts them, and `CometCodegenHOFSuite` already proved they evaluate correctly inside the kernel when nested in a registered `ScalaUDF`. Wiring each HOF into the serde lets a top-level HOF projection stay native instead of falling back, while matching Spark exactly (the kernel runs Spark's own per-element evaluation). ## What changes are included in this PR? Register the following previously-unsupported higher-order functions as `CometCodegenDispatch` (no native rust path; they ride the codegen dispatcher): - array: `transform`, `exists`, `forall`, `aggregate`/`reduce`, `array_sort` (with comparator), `zip_with` - map: `map_filter`, `transform_keys`, `transform_values`, `map_zip_with` When `spark.comet.exec.scalaUDF.codegen.enabled=false`, these fall back to Spark cleanly. `array_filter` with a general lambda is intentionally left out: it already has a partial native serde (the `array_compact` / `IsNotNull` special case) that reports `Unsupported` for general lambdas, so routing it through the dispatcher is a separate, more involved change. ## How are these changes tested? Adds `CometHigherOrderFunctionSuite`, which for each function asserts the projection stays native and matches Spark (`checkSparkAnswerAndOperator`) over parquet-backed array/map columns including null and empty rows, plus tests that an array HOF and a map HOF fall back to Spark when the dispatcher is disabled. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
