andygrove opened a new pull request, #4417:
URL: https://github.com/apache/datafusion-comet/pull/4417
## Which issue does this PR close?
Part of #3202. Follow-on to #4373.
## Rationale for this change
Nine common Spark date/time expressions are currently unsupported in Comet
and force the
plan to fall back to Spark for the enclosing operator. All nine have real
`doGenCode` (not
`CodegenFallback`) and produce Arrow-supported output types, so they slot
directly into the
codegen-dispatcher pattern established for `date_format` in #4373:
`getSupportLevel =
Compatible()`, `convert = emitJvmCodegenDispatch`, gated by
`spark.comet.exec.scalaUDF.codegen.enabled` (default off, experimental).
When the flag is off, behavior is unchanged — the new serdes call `withInfo`
and return
`None`, and the operator falls back to Spark as it does today. When the flag
is on, Spark's
own `doGenCode` runs inside the Janino-compiled Arrow-direct kernel and the
operator stays
inside Comet.
## What changes are included in this PR?
- `CometCodegenDispatch[T <: Expression]` helper in `CometScalaUDF.scala`.
Names the
Compatible/notes/convert triple for any expression whose only routing rule
is "go through
the dispatcher".
- Nine one-line serde singletons in `datetime.scala`: `AddMonths`,
`MonthsBetween`,
`MakeTimestamp`, `MillisToTimestamp`, `MicrosToTimestamp`, `UnixSeconds`,
`UnixMillis`,
`UnixMicros`, `ToUnixTimestamp`. Each is registered in
`temporalExpressions`.
- SQL fixtures under
`spark/src/test/resources/sql-tests/expressions/datetime/`: one file
per expression, plus `_ansi.sql` siblings for the three throw-capable
expressions
(`make_timestamp`, `timestamp_millis`, `to_unix_timestamp`) confirming
exception
semantics survive the dispatcher round-trip.
- Parameterized `CometCodegenSourceSuite` test that exercises
`generateSource` for all nine
expressions with realistic `BoundReference` inputs, catching `canHandle`
rejections at
the unit level.
Interval-producing expressions
(`MakeInterval`/`MakeYMInterval`/`MakeDTInterval`) are
explicitly out of scope — the dispatcher's `isSupportedDataType` does not
currently include
Spark's interval types. Version-conditional expressions
(`TimestampAdd`/`TimestampDiff`
3.4+, `DayName` 3.5+, `MonthName` 4.0+) are deferred to a follow-on so this
PR avoids touching
the `CometExprShim` files.
Was scaffolded with the `superpowers:brainstorming` and
`superpowers:writing-plans` skills.
## How are these changes tested?
- Nine new SQL fixtures in `CometSqlFileTestSuite` (one per expression,
non-UTC session
timezone, codegen flag enabled at file scope).
- Three ANSI sibling fixtures asserting exception semantics via `query
expect_error(...)`.
- Parameterized `CometCodegenSourceSuite` unit test that compiles each
expression through
`CometBatchKernelCodegen.generateSource` and asserts the generated source
is non-empty.
- Existing dispatcher coverage (`CometCodegenSuite`,
`CometTemporalExpressionSuite`)
exercises the flag-off path through the shared `emitJvmCodegenDispatch`
helper, so no
per-expression "falls back to Spark" Scala tests are added.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]