andygrove opened a new pull request, #4448: URL: https://github.com/apache/datafusion-comet/pull/4448
## Which issue does this PR close? Closes #. ## Rationale for this change Only `from_utc_timestamp` and `to_utc_timestamp` had per-version audit sub-bullets in `spark_expressions_support.md`. The rest of the `datetime_funcs` section is implemented but undocumented as to which Spark versions it has been validated against. This PR brings every implemented `[x]` date/time function (38 SQL names backed by 33 Comet serde objects) under the same audit format so contributors can see at a glance which versions have been checked and what known divergences exist. This is the first multi-expression application of the `audit-comet-expression` skill following the consistency-rule clarifications added in #4447. ## What changes are included in this PR? Adds sub-bullets to every `[x]` entry under `### datetime_funcs` in `docs/source/contributor-guide/spark_expressions_support.md`, except the five entries that have no dedicated Comet serde (`current_timezone`, `date_part`, `datepart`, `extract`, `localtimestamp` route through Spark optimizer rewrites to other expressions or evaluate to constants). For each function, the sub-bullets record: - Whether the Spark class is identical across 3.4.3, 3.5.8, and 4.0.1. - Spark 4.0 changes. Universally the `NullIntolerant` trait migrated to a `nullIntolerant: Boolean` field override; many string-typed `inputTypes` widened to `StringTypeWithCollation(supportsTrimCollation = true)`; ANSI error helpers were renamed. - Known divergences between Comet and Spark, with tracking-issue links where applicable. The work was driven by 8 parallel agents using the audit-comet-expression skill, each handling a related group of expressions: | Group | Expressions | Pattern | | --- | --- | --- | | A | `add_months`, `months_between`, `make_timestamp`, `timestamp_micros`, `timestamp_millis`, `unix_seconds`, `unix_millis`, `unix_micros`, `to_unix_timestamp` | JVM codegen dispatcher | | B | `year`, `month`, `day`, `dayofmonth`, `dayofweek`, `weekday`, `dayofyear`, `weekofyear`, `quarter` | `CometExprGetDateField` | | C | `hour`, `minute`, `second` | Timezone-aware field extractors | | D | `date_add`, `dateadd`, `date_sub`, `date_diff`, `datediff`, `date_from_unix_date`, `next_day`, `make_date`, `last_day` | `CometScalarFunction` wrappers | | E | `convert_timezone`, `unix_timestamp`, `from_unixtime`, `timestamp_seconds` | Timezone / unix conversion | | F | `trunc`, `date_trunc` | Truncation | | G | `date_format`, `unix_date` | Formatting / date conversion | | H | `days`, `hours` (out of scope: V2 partition transforms with no SQL name) | Iceberg transforms | ## Consistency findings for follow-up The audit also surfaced support-level / reason-consistency issues per the rules in #4447. They are documentation-only here; the fixes will land as a follow-up PR. Highlights: **Behaviour gaps not surfaced anywhere:** - `CometNextDay`: does not throw under ANSI mode (Spark does); also `trim()`s the day-of-week argument that Spark does not. - `CometMakeDate`: does not throw under ANSI mode (Spark does); also accepts year 0 and negative years that Spark rejects. - `CometUnixTimestamp`: `getUnsupportedReasons` claims `TimestampNTZType is not supported`, but `isSupportedInputType` returns `true` for it. Either the predicate or the reason is wrong. **Reason / support-level alignment (per the new skill rules):** - `CometHour`: inline string in `getSupportLevel` drifts from the `incompatReason` constant. - `CometMinute`, `CometSecond`: TimestampNTZ reason duplicated in `getIncompatibleReasons` and `getSupportLevel` with no shared constant; the three serdes share an identical string that should be hoisted into a private companion (mirroring `UTCTimestampSerde`). - `CometFromUnixTime`: `Incompatible(None)` drops the reason from EXPLAIN output even though `getIncompatibleReasons` populates one. - `CometTruncDate` / `CometTruncTimestamp`: incompatible-reason text drift, missing `getUnsupportedReasons` override on `CometTruncTimestamp`, no shared `private val` for reasons. - `CometSecondsToTimestamp`: returns `Unsupported(Some(...))` without overriding `getUnsupportedReasons`. - `CometDateFormat`: `getSupportLevel` returns `Compatible()` while `convert` reads `allowIncompatible` and branches on UTC vs non-UTC; per the skill, this gating belongs in `getSupportLevel`. Also no `getUnsupportedReasons` override despite multiple fallback branches. - `CometDays`, `CometHours`: `convert` falls back via `withInfo` for unsupported child types, but `getSupportLevel` / `getUnsupportedReasons` are not overridden so the reason never reaches EXPLAIN or the Compatibility Guide. **Informational:** - The 9 codegen-dispatched expressions in Group A report `Compatible()` while their `convert` returns `None` when `spark.comet.exec.scalaUDF.codegen.enabled=false`. The dispatcher flag is not surfaced in the compatibility guide. ## How are these changes tested? Documentation-only change. `prettier --check` passes on the modified file. No code paths are modified. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
