andygrove opened a new pull request, #4454:
URL: https://github.com/apache/datafusion-comet/pull/4454
## Which issue does this PR close?
Closes #.
## Rationale for this change
Spark's `GetTimestamp` expression powers the format-based variants of
`to_timestamp(s, fmt)`, `to_date(s, fmt)`, `to_timestamp_ntz(s, fmt)`, and
`try_to_timestamp(s, fmt)`. Until now Comet had no serde for it, so any plan
containing those calls fell back to Spark for the enclosing operator.
A native re-implementation would have to mirror Spark's full
`SimpleDateFormat` / `TimestampFormatter` parsing semantics, including time
parser policy (LEGACY/CORRECTED/EXCEPTION), ANSI failure modes, timezone
handling, and locale awareness. The upstream `datafusion-spark` crate has no
equivalent function, so a from-scratch native implementation would be a large
project unlikely to be 100% Spark-compatible without significant follow-up work.
The closely-related `ToUnixTimestamp` already uses the codegen dispatcher
(`CometCodegenDispatch`) to run Spark's own `doGenCode` inside the Comet
pipeline. Wiring `GetTimestamp` the same way produces bit-exact results today
and works uniformly across Spark 3.4, 3.5, and 4.0.
## What changes are included in this PR?
- Adds `object CometGetTimestamp extends CometCodegenDispatch[GetTimestamp]`
in `spark/src/main/scala/org/apache/comet/serde/datetime.scala`.
- Registers `classOf[GetTimestamp] -> CometGetTimestamp` in the
`temporalExpressions` map in `QueryPlanSerde.scala`.
- Adds two SQL file tests:
-
`spark/src/test/resources/sql-tests/expressions/datetime/get_timestamp.sql`
covers `to_timestamp(s, fmt)`, `to_date(s, fmt)`, `to_timestamp_ntz(s, fmt)`,
`try_to_timestamp(s, fmt)`, NULLs, parse errors, column references, literals,
and column-as-format.
-
`spark/src/test/resources/sql-tests/expressions/datetime/get_timestamp_ansi.sql`
covers ANSI mode error throwing with `expect_error(CANNOT_PARSE_TIMESTAMP)`
and the non-throwing `try_to_timestamp` fallback.
The `implement-comet-expression` skill was used to scaffold the
implementation.
## How are these changes tested?
```
./mvnw test -Pspark-3.5 -Dtest=none
-Dsuites="org.apache.comet.CometSqlFileTestSuite get_timestamp"
-DwildcardSuites=none
```
Both new test files pass. The existing `to_timestamp_*`,
`try_to_timestamp_*`, and related SQL file tests continue to pass under the
same profile (12 tests total).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]