andygrove opened a new pull request, #4493: URL: https://github.com/apache/datafusion-comet/pull/4493
## Which issue does this PR close? Closes #. ## Rationale for this change Completes the per-category expression audit started in earlier PRs (#4469 struct, #4470 json, #4473 collection, #4474 misc, #4475 conditional, #4476 hash, #4478 map, #4479 bitwise, #4480 predicate, #4483 array, #4486 math), using the updated `audit-comet-expression` skill in #4468. `cast` is the only expression in `conversion_funcs`; the other entries (`bigint`, `binary`, `boolean`, ...) are SQL-syntactic shorthands for `CAST(... AS X)` and resolve to `Cast`. ## What changes are included in this PR? ### Support-doc audit notes Add a per-version audit sub-bullet to `cast` in `docs/source/contributor-guide/spark_expressions_support.md`. Highlights: - `Cast(child, dataType, timeZoneId, evalMode)` signature is stable across 3.4 / 3.5 / 4.0 / 4.1; per-version `EvalMode` resolution lives in `CometExprShim`. - Comet uses a per-source-type support matrix in `CometCast.scala` that returns `Compatible` / `Incompatible(reason)` / `Unsupported(reason)` for each (from, to) pair. - Spark's numeric-to-numeric `canCast` matrix is unchanged across all four versions; Spark 4.0 adds `VariantType` and collated `StringType`; Spark 4.1 adds `TimeType` and geospatial types with their own arms. None of those new arms are in `CometCast`, so they fall back to Spark. ### Support-level consistency fixes None in this PR. The audit surfaced cosmetic inconsistencies (e.g. `canCastFromFloat/Double/Decimal` ignore `evalMode`; the static `getUnsupportedReasons` / `getIncompatibleReasons` don't enumerate the per-pair reasons returned by `isSupported`; one `case _: Incompatible()` with no reason in the array-of-binary branch) but these touch a load-bearing file and are deferred to follow-ups. ### Tracking issues filed for follow-up - #4488 `CAST(<binary> AS STRING)` uses unsafe `String::from_utf8_unchecked` (undefined behaviour for non-UTF8 inputs). - #4489 Spark 4.0 collated-string casts are implicitly unhandled; no test asserts the fallback (umbrella #2190). - #4490 Spark 4.1 `TimeType` casts have no explicit `Unsupported` arm; falls back implicitly but doesn't appear in the auto-generated compat doc. - #4491 `CAST(<map> AS <map>)` falls back even though native `cast_map_to_map` exists. - #4492 `spark.sql.legacy.castComplexTypesToString.enabled=true` is not honoured. Existing #1371 (`CAST(float|double AS decimal)` rounding) and #2190 (Spark 4.0 collation umbrella) are also referenced from the doc sub-bullet. ### Audit process Audited using the `audit-comet-expression` skill (4 Spark versions per #4468), driven by 3 parallel agents covering high-level + numeric, string conversions, and datetime + complex types. ## How are these changes tested? - `make core` succeeds (no code changes; doc only). - Existing test coverage in `CometCastSuite` and the cast-related SQL fixtures remains unchanged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
