andygrove opened a new pull request, #4618:
URL: https://github.com/apache/datafusion-comet/pull/4618

   ## Which issue does this PR close?
   
   Closes #4617.
   
   ## Rationale for this change
   
   Spark's array and map higher-order (lambda) functions had no Comet 
implementation, so any query using them fell back to Spark for the enclosing 
operator. They are hard to implement natively in Rust because they evaluate an 
arbitrary user lambda per element.
   
   The codegen dispatcher already admits `CodegenFallback` expressions, which 
includes all higher-order functions: `CometBatchKernelCodegen.canHandle` 
accepts them, and `CometCodegenHOFSuite` already proved they evaluate correctly 
inside the kernel when nested in a registered `ScalaUDF`. Wiring each HOF into 
the serde lets a top-level HOF projection stay native instead of falling back, 
while matching Spark exactly (the kernel runs Spark's own per-element 
evaluation).
   
   ## What changes are included in this PR?
   
   Register the following previously-unsupported higher-order functions as 
`CometCodegenDispatch` (no native rust path; they ride the codegen dispatcher):
   
   - array: `transform`, `exists`, `forall`, `aggregate`/`reduce`, `array_sort` 
(with comparator), `zip_with`
   - map: `map_filter`, `transform_keys`, `transform_values`, `map_zip_with`
   
   When `spark.comet.exec.scalaUDF.codegen.enabled=false`, these fall back to 
Spark cleanly.
   
   `array_filter` with a general lambda is intentionally left out: it already 
has a partial native serde (the `array_compact` / `IsNotNull` special case) 
that reports `Unsupported` for general lambdas, so routing it through the 
dispatcher is a separate, more involved change.
   
   ## How are these changes tested?
   
   Adds `CometHigherOrderFunctionSuite`, which for each function asserts the 
projection stays native and matches Spark (`checkSparkAnswerAndOperator`) over 
parquet-backed array/map columns including null and empty rows, plus tests that 
an array HOF and a map HOF fall back to Spark when the dispatcher is disabled.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to