andygrove opened a new pull request, #4417:
URL: https://github.com/apache/datafusion-comet/pull/4417

   ## Which issue does this PR close?
   
   Part of #3202. Follow-on to #4373.
   
   ## Rationale for this change
   
   Nine common Spark date/time expressions are currently unsupported in Comet 
and force the
   plan to fall back to Spark for the enclosing operator. All nine have real 
`doGenCode` (not
   `CodegenFallback`) and produce Arrow-supported output types, so they slot 
directly into the
   codegen-dispatcher pattern established for `date_format` in #4373: 
`getSupportLevel =
   Compatible()`, `convert = emitJvmCodegenDispatch`, gated by
   `spark.comet.exec.scalaUDF.codegen.enabled` (default off, experimental).
   
   When the flag is off, behavior is unchanged — the new serdes call `withInfo` 
and return
   `None`, and the operator falls back to Spark as it does today. When the flag 
is on, Spark's
   own `doGenCode` runs inside the Janino-compiled Arrow-direct kernel and the 
operator stays
   inside Comet.
   
   ## What changes are included in this PR?
   
   - `CometCodegenDispatch[T <: Expression]` helper in `CometScalaUDF.scala`. 
Names the
     Compatible/notes/convert triple for any expression whose only routing rule 
is "go through
     the dispatcher".
   - Nine one-line serde singletons in `datetime.scala`: `AddMonths`, 
`MonthsBetween`,
     `MakeTimestamp`, `MillisToTimestamp`, `MicrosToTimestamp`, `UnixSeconds`, 
`UnixMillis`,
     `UnixMicros`, `ToUnixTimestamp`. Each is registered in 
`temporalExpressions`.
   - SQL fixtures under 
`spark/src/test/resources/sql-tests/expressions/datetime/`: one file
     per expression, plus `_ansi.sql` siblings for the three throw-capable 
expressions
     (`make_timestamp`, `timestamp_millis`, `to_unix_timestamp`) confirming 
exception
     semantics survive the dispatcher round-trip.
   - Parameterized `CometCodegenSourceSuite` test that exercises 
`generateSource` for all nine
     expressions with realistic `BoundReference` inputs, catching `canHandle` 
rejections at
     the unit level.
   
   Interval-producing expressions 
(`MakeInterval`/`MakeYMInterval`/`MakeDTInterval`) are
   explicitly out of scope — the dispatcher's `isSupportedDataType` does not 
currently include
   Spark's interval types. Version-conditional expressions 
(`TimestampAdd`/`TimestampDiff`
   3.4+, `DayName` 3.5+, `MonthName` 4.0+) are deferred to a follow-on so this 
PR avoids touching
   the `CometExprShim` files.
   
   Was scaffolded with the `superpowers:brainstorming` and 
`superpowers:writing-plans` skills.
   
   ## How are these changes tested?
   
   - Nine new SQL fixtures in `CometSqlFileTestSuite` (one per expression, 
non-UTC session
     timezone, codegen flag enabled at file scope).
   - Three ANSI sibling fixtures asserting exception semantics via `query 
expect_error(...)`.
   - Parameterized `CometCodegenSourceSuite` unit test that compiles each 
expression through
     `CometBatchKernelCodegen.generateSource` and asserts the generated source 
is non-empty.
   - Existing dispatcher coverage (`CometCodegenSuite`, 
`CometTemporalExpressionSuite`)
     exercises the flag-off path through the shared `emitJvmCodegenDispatch` 
helper, so no
     per-expression "falls back to Spark" Scala tests are added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to