andygrove opened a new pull request, #4448:
URL: https://github.com/apache/datafusion-comet/pull/4448

   ## Which issue does this PR close?
   
   Closes #.
   
   ## Rationale for this change
   
   Only `from_utc_timestamp` and `to_utc_timestamp` had per-version audit 
sub-bullets in `spark_expressions_support.md`. The rest of the `datetime_funcs` 
section is implemented but undocumented as to which Spark versions it has been 
validated against. This PR brings every implemented `[x]` date/time function 
(38 SQL names backed by 33 Comet serde objects) under the same audit format so 
contributors can see at a glance which versions have been checked and what 
known divergences exist.
   
   This is the first multi-expression application of the 
`audit-comet-expression` skill following the consistency-rule clarifications 
added in #4447.
   
   ## What changes are included in this PR?
   
   Adds sub-bullets to every `[x]` entry under `### datetime_funcs` in 
`docs/source/contributor-guide/spark_expressions_support.md`, except the five 
entries that have no dedicated Comet serde (`current_timezone`, `date_part`, 
`datepart`, `extract`, `localtimestamp` route through Spark optimizer rewrites 
to other expressions or evaluate to constants).
   
   For each function, the sub-bullets record:
   
   - Whether the Spark class is identical across 3.4.3, 3.5.8, and 4.0.1.
   - Spark 4.0 changes. Universally the `NullIntolerant` trait migrated to a 
`nullIntolerant: Boolean` field override; many string-typed `inputTypes` 
widened to `StringTypeWithCollation(supportsTrimCollation = true)`; ANSI error 
helpers were renamed.
   - Known divergences between Comet and Spark, with tracking-issue links where 
applicable.
   
   The work was driven by 8 parallel agents using the audit-comet-expression 
skill, each handling a related group of expressions:
   
   | Group | Expressions | Pattern |
   | --- | --- | --- |
   | A | `add_months`, `months_between`, `make_timestamp`, `timestamp_micros`, 
`timestamp_millis`, `unix_seconds`, `unix_millis`, `unix_micros`, 
`to_unix_timestamp` | JVM codegen dispatcher |
   | B | `year`, `month`, `day`, `dayofmonth`, `dayofweek`, `weekday`, 
`dayofyear`, `weekofyear`, `quarter` | `CometExprGetDateField` |
   | C | `hour`, `minute`, `second` | Timezone-aware field extractors |
   | D | `date_add`, `dateadd`, `date_sub`, `date_diff`, `datediff`, 
`date_from_unix_date`, `next_day`, `make_date`, `last_day` | 
`CometScalarFunction` wrappers |
   | E | `convert_timezone`, `unix_timestamp`, `from_unixtime`, 
`timestamp_seconds` | Timezone / unix conversion |
   | F | `trunc`, `date_trunc` | Truncation |
   | G | `date_format`, `unix_date` | Formatting / date conversion |
   | H | `days`, `hours` (out of scope: V2 partition transforms with no SQL 
name) | Iceberg transforms |
   
   ## Consistency findings for follow-up
   
   The audit also surfaced support-level / reason-consistency issues per the 
rules in #4447. They are documentation-only here; the fixes will land as a 
follow-up PR. Highlights:
   
   **Behaviour gaps not surfaced anywhere:**
   
   - `CometNextDay`: does not throw under ANSI mode (Spark does); also 
`trim()`s the day-of-week argument that Spark does not.
   - `CometMakeDate`: does not throw under ANSI mode (Spark does); also accepts 
year 0 and negative years that Spark rejects.
   - `CometUnixTimestamp`: `getUnsupportedReasons` claims `TimestampNTZType is 
not supported`, but `isSupportedInputType` returns `true` for it. Either the 
predicate or the reason is wrong.
   
   **Reason / support-level alignment (per the new skill rules):**
   
   - `CometHour`: inline string in `getSupportLevel` drifts from the 
`incompatReason` constant.
   - `CometMinute`, `CometSecond`: TimestampNTZ reason duplicated in 
`getIncompatibleReasons` and `getSupportLevel` with no shared constant; the 
three serdes share an identical string that should be hoisted into a private 
companion (mirroring `UTCTimestampSerde`).
   - `CometFromUnixTime`: `Incompatible(None)` drops the reason from EXPLAIN 
output even though `getIncompatibleReasons` populates one.
   - `CometTruncDate` / `CometTruncTimestamp`: incompatible-reason text drift, 
missing `getUnsupportedReasons` override on `CometTruncTimestamp`, no shared 
`private val` for reasons.
   - `CometSecondsToTimestamp`: returns `Unsupported(Some(...))` without 
overriding `getUnsupportedReasons`.
   - `CometDateFormat`: `getSupportLevel` returns `Compatible()` while 
`convert` reads `allowIncompatible` and branches on UTC vs non-UTC; per the 
skill, this gating belongs in `getSupportLevel`. Also no 
`getUnsupportedReasons` override despite multiple fallback branches.
   - `CometDays`, `CometHours`: `convert` falls back via `withInfo` for 
unsupported child types, but `getSupportLevel` / `getUnsupportedReasons` are 
not overridden so the reason never reaches EXPLAIN or the Compatibility Guide.
   
   **Informational:**
   
   - The 9 codegen-dispatched expressions in Group A report `Compatible()` 
while their `convert` returns `None` when 
`spark.comet.exec.scalaUDF.codegen.enabled=false`. The dispatcher flag is not 
surfaced in the compatibility guide.
   
   ## How are these changes tested?
   
   Documentation-only change. `prettier --check` passes on the modified file. 
No code paths are modified.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to