andygrove opened a new issue, #4501: URL: https://github.com/apache/datafusion-comet/issues/4501
Tracking issue for follow-up work surfaced by the cast expression audit in #4493. Each item is a support-level / serde consistency fix that the audit deliberately deferred because `CometCast.scala` is load-bearing. The relevant file is `spark/src/main/scala/org/apache/comet/expressions/CometCast.scala`. ## High priority ### 1. `canCastFromFloat` / `canCastFromDouble` / `canCastFromDecimal` ignore `evalMode` `canCastFromFloat` (`CometCast.scala:376-385`), `canCastFromDouble` (`CometCast.scala:387-395`), and `canCastFromDecimal` (`CometCast.scala:397-403`) take no `evalMode` argument, even though their callers in `isSupported` (`CometCast.scala:176, 188, 190`) thread `evalMode` through every other arm. Spark's overflow / null behaviour for float and double to integral casts differs between `LEGACY`, `ANSI`, and `TRY`, and the native side has explicit per-`evalMode` branches for these narrowing casts. The support level should reflect the eval mode so that EXPLAIN and the auto-generated compatibility doc carry the right reason per mode, and so the support matrix matches what the native code actually does. ### 2. `getUnsupportedReasons` / `getIncompatibleReasons` do not enumerate per-pair reasons `getUnsupportedReasons` (`CometCast.scala:55-56`) and `getIncompatibleReasons` (`CometCast.scala:51-53`) return a single generic sentence each. The per-pair `isSupported` matrix returns specific `Unsupported(Some(reason))` / `Incompatible(Some(reason))` values that never reach the static API, so the auto-generated compatibility docs and the dispatcher cannot surface them. The static reasons should enumerate the actual per-pair reasons (or the API should be changed so callers get the per-pair reason directly). ### 3. `case _: Incompatible()` with no reason in the array-of-binary branch `CometCast.scala:154-155` returns a bare `Incompatible()` for `CAST(ARRAY<BINARY> AS STRING)` with no reason string. The audit skill requires every `Incompatible` to carry a concise reason that names the divergence, so EXPLAIN and the compatibility doc can show why. The reason here is the same `String::from_utf8_unchecked` UB tracked in #4488; fill it in and cross-reference that issue. ## Medium priority ### 4. Literal short-circuit may swallow per-pair `Incompatible` reasons `getSupportLevel` (`CometCast.scala:58-66`) short-circuits any `cast.child.isInstanceOf[Literal]` to `Compatible()` and defers to `CometLiteral`. For cases that the per-pair matrix would have marked `Incompatible(Some(...))` (for example float-to-decimal rounding under #1371), the short-circuit hides the reason from EXPLAIN even though the runtime semantics still differ. Worth confirming whether `CometLiteral` re-checks the (from, to) pair or whether literal casts genuinely bypass these divergences, and if not, route through `isSupported` instead. --- Surfaced by the `audit-comet-expression` skill run in #4493. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
