andygrove opened a new pull request, #4762: URL: https://github.com/apache/datafusion-comet/pull/4762
## Which issue does this PR close? Documentation refresh, not tied to a single issue. The notes updated here correspond to already-closed issues #4463, #4464, #4465, #4481, #4482, #4554, and #4681. ## Rationale for this change A sweep of `docs/source/contributor-guide/expression-audits` found that several pages still describe correctness gaps as live, even though the underlying issues have been fixed and merged. Audit pages are meant to describe current behavior, so a reader (or the `audit-comet-expression` skill) would otherwise be misled into thinking these expressions still diverge from Spark on the default config. ## What changes are included in this PR? Updated audit notes to reflect the current code: - `array_contains` / `array_distinct` / `array_except` / `array_max` / `array_min` / `array_union`: float/double arrays with NaN and signed zero now match Spark, because DataFusion canonicalizes them. The stale "Known divergence" notes (#4481, #4482) are replaced with the current behavior. The `array_union` ordering caveat is also resolved (#4681). - `array_intersect`: now reports `Incompatible` with a codegen-dispatch fallback, so it is Spark-correct by default; the native path (different element ordering) is used only when incompatible expressions are explicitly allowed. - `bit_length` / `octet_length`: `BinaryType` input is now reported `Unsupported` and falls back to Spark cleanly instead of failing in native execution (#4464). - `translate`: now reports `Incompatible` (graphemes vs code points, U+0000 handling) and falls back by default (#4463). - `decode`: now routed through the codegen dispatcher on all versions, honouring the `charset` argument and the Spark 4.0 `legacyCharsets` / `legacyErrorAction` flags (#4465). - `try_make_timestamp`: now routed through the codegen dispatcher, returning NULL for invalid inputs to match Spark (#4554). - `from_utc_timestamp` / `to_utc_timestamp`: now report `Incompatible` with a codegen-dispatch fallback, so Spark's legacy zone forms (`GMT+1`, `UTC+1`, `PST`) are Spark-correct by default; the native parser (IANA / `+HH:MM` only) is used only under the opt-in path (#2013). ## How are these changes tested? Documentation-only change. Each updated note was verified against the current serde / native implementation (`getSupportLevel` gating, codegen-dispatch wiring, and the DataFusion canonicalization behavior). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
