andygrove opened a new pull request, #4499:
URL: https://github.com/apache/datafusion-comet/pull/4499

   ## Which issue does this PR close?
   
   Closes #4498.
   
   ## Rationale for this change
   
   In the TPC-DS golden files, Upper and Lower were marked Incompatible (Rust 
scalar function does not implement JVM/ICU case mappings) and InitCap was 
marked Incompatible (diverges on hyphen-separated words, #1052). As a result, 
queries using these expressions fell back to Spark for the enclosing operator. 
This PR keeps them inside the Comet pipeline and improves TPC-DS coverage.
   
   ## What changes are included in this PR?
   
   - `CometCaseConversionBase` (Upper/Lower) and `CometInitCap` now report 
`Compatible()` and route through `CometScalaUDF.emitJvmCodegenDispatch`. The 
codegen dispatcher runs Spark's own `doGenCode` inside the Comet pipeline, 
guaranteeing identical behavior across Spark 3.4 / 3.5 / 4.0.
   - Existing native paths remain available as opt-ins:
     - `spark.comet.caseConversion.enabled=true` selects the Rust 
`upper`/`lower` scalar function (faster but locale-incompatible).
     - `spark.comet.expression.InitCap.allowIncompatible=true` selects the Rust 
`initcap` scalar function (faster but diverges on hyphens).
   - `CometPlanStabilitySuite` enables `COMET_SCALA_UDF_CODEGEN_ENABLED=true`. 
TPC-DS q24, q24a, q24b goldens were regenerated and lose their case-conversion 
fallback markers.
   
   The `implement-comet-expression` skill was used to scaffold the 
implementation.
   
   ## How are these changes tested?
   
   - Updated `upper.sql`, `lower.sql`, `init_cap.sql` to exercise the codegen 
dispatch path with `Config: spark.comet.exec.scalaUDF.codegen.enabled=true`, 
including locale-sensitive cases (German ß, Turkish dotted I, Greek sigma) and 
hyphen/apostrophe-separated names.
   - Existing `upper_enabled.sql`, `lower_enabled.sql`, `init_cap_enabled.sql` 
continue to exercise the native opt-in paths.
   - `CometStringExpressionSuite` (33 tests) and `CometSqlFileTestSuite` for 
`expressions/string` (42 tests) pass on Spark 3.5.
   - `CometTPCDSV1_4_PlanStabilitySuite` and 
`CometTPCDSV2_7_PlanStabilitySuite` pass on Spark 3.4 / 3.5 / 4.0 with the 
regenerated goldens.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to