andygrove opened a new issue, #4501:
URL: https://github.com/apache/datafusion-comet/issues/4501

   Tracking issue for follow-up work surfaced by the cast expression audit in 
#4493. Each item is a support-level / serde consistency fix that the audit 
deliberately deferred because `CometCast.scala` is load-bearing.
   
   The relevant file is 
`spark/src/main/scala/org/apache/comet/expressions/CometCast.scala`.
   
   ## High priority
   
   ### 1. `canCastFromFloat` / `canCastFromDouble` / `canCastFromDecimal` 
ignore `evalMode`
   
   `canCastFromFloat` (`CometCast.scala:376-385`), `canCastFromDouble` 
(`CometCast.scala:387-395`), and `canCastFromDecimal` 
(`CometCast.scala:397-403`) take no `evalMode` argument, even though their 
callers in `isSupported` (`CometCast.scala:176, 188, 190`) thread `evalMode` 
through every other arm. Spark's overflow / null behaviour for float and double 
to integral casts differs between `LEGACY`, `ANSI`, and `TRY`, and the native 
side has explicit per-`evalMode` branches for these narrowing casts. The 
support level should reflect the eval mode so that EXPLAIN and the 
auto-generated compatibility doc carry the right reason per mode, and so the 
support matrix matches what the native code actually does.
   
   ### 2. `getUnsupportedReasons` / `getIncompatibleReasons` do not enumerate 
per-pair reasons
   
   `getUnsupportedReasons` (`CometCast.scala:55-56`) and 
`getIncompatibleReasons` (`CometCast.scala:51-53`) return a single generic 
sentence each. The per-pair `isSupported` matrix returns specific 
`Unsupported(Some(reason))` / `Incompatible(Some(reason))` values that never 
reach the static API, so the auto-generated compatibility docs and the 
dispatcher cannot surface them. The static reasons should enumerate the actual 
per-pair reasons (or the API should be changed so callers get the per-pair 
reason directly).
   
   ### 3. `case _: Incompatible()` with no reason in the array-of-binary branch
   
   `CometCast.scala:154-155` returns a bare `Incompatible()` for 
`CAST(ARRAY<BINARY> AS STRING)` with no reason string. The audit skill requires 
every `Incompatible` to carry a concise reason that names the divergence, so 
EXPLAIN and the compatibility doc can show why. The reason here is the same 
`String::from_utf8_unchecked` UB tracked in #4488; fill it in and 
cross-reference that issue.
   
   ## Medium priority
   
   ### 4. Literal short-circuit may swallow per-pair `Incompatible` reasons
   
   `getSupportLevel` (`CometCast.scala:58-66`) short-circuits any 
`cast.child.isInstanceOf[Literal]` to `Compatible()` and defers to 
`CometLiteral`. For cases that the per-pair matrix would have marked 
`Incompatible(Some(...))` (for example float-to-decimal rounding under #1371), 
the short-circuit hides the reason from EXPLAIN even though the runtime 
semantics still differ. Worth confirming whether `CometLiteral` re-checks the 
(from, to) pair or whether literal casts genuinely bypass these divergences, 
and if not, route through `isSupported` instead.
   
   ---
   
   Surfaced by the `audit-comet-expression` skill run in #4493.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to