andygrove opened a new issue, #4500:
URL: https://github.com/apache/datafusion-comet/issues/4500

   Tracking issue for follow-up work surfaced by the math expression audit in 
#4486. Each item below is either a support-level / serde correctness fix that 
the audit deliberately deferred, or a coverage gap the audit documented but did 
not implement.
   
   ## High priority
   
   ### 1. Lift `convert`-time `withInfo` fallbacks in `arithmetic.scala` into 
`getSupportLevel`
   
   `CometAdd`, `CometSubtract`, `CometMultiply`, `CometDivide`, 
`CometIntegralDivide`, `CometRemainder`, `CometUnaryMinus`, and `CometRound` 
(`spark/src/main/scala/org/apache/comet/serde/arithmetic.scala:94-271, 300-321, 
355`) currently bail out from `convert` with `withInfo(...) + return None` for 
cases that are knowable from the expression alone (unsupported datatype, 
`EvalMode.TRY` on `Remainder`, negative-scale `Decimal` on `Round`, `Float` / 
`Double` on `Round`, etc.).
   
   The `audit-comet-expression` skill requires expression-shape restrictions to 
be declared as `Unsupported(Some(reason))` / `Incompatible(Some(reason))` 
branches in `getSupportLevel` so that:
   - EXPLAIN surfaces the reason at planning time
   - the auto-generated compatibility doc picks them up
   - the dispatcher can route around them
   
   This was deferred from #4486 because `arithmetic.scala` is load-bearing and 
the fix is mechanical but wide.
   
   ### 2. `UnaryPositive` has no serde on Spark 3.4 / 3.5
   
   Projections containing `+col` silently disable Comet for the enclosing 
projection on Spark 3.4 / 3.5 because there is no `CometUnaryPositive` 
registration. On Spark 4.0+ `UnaryPositive` is `RuntimeReplaceable` and the 
optimizer removes it before serde, so the gap is transparent there. Wire a 
passthrough serde for the 3.4 / 3.5 shim path so the projection stays native.
   
   ### 3. `hex` / `unhex` Spark 4.x collation propagation
   
   Spark 4.x widens `Hex` / `Unhex` input types to `StringTypeWithCollation` 
and preserves collation in `dataType`. `CometHex` / `CometUnhex` pass 
`expr.dataType` to the native `SparkHex` / `spark_unhex` UDFs, which always 
return `Utf8`, so collation does not propagate. Per the audit skill, the 
support level should be `Incompatible(Some(reason))` for non-default collations 
rather than the current `Compatible`. Cross-reference #2190 / #4496.
   
   ## Medium priority (robustness / coverage)
   
   ### 4. `greatest` / `least` do not gate input types
   
   Registered as bare `CometScalarFunction(\"greatest\")` / 
`CometScalarFunction(\"least\")` in `QueryPlanSerde.scala`. Spark accepts 
interval inputs (and a few Spark-specific orderings) that the native UDF may 
not handle; there is no explicit fallback path, so a runtime error from the 
native UDF is the failure mode. Add an input-type guard that falls back to 
Spark for the unsupported cases.
   
   ### 5. `ceil(expr, scale)` / `floor(expr, scale)` (RoundCeil / RoundFloor) 
not wired
   
   The two-argument forms always fall back to Spark. Wire them via the same 
path as `Round`.
   
   ### 6. `BRound` (HALF_EVEN rounding) not wired
   
   Listed as `[ ] bround` in the support doc. Banker's rounding is independent 
of the BigDecimal-via-toString issue that blocks Float / Double `Round`, so the 
integer / decimal paths could be supported.
   
   ### 7. `signum` rejects interval inputs
   
   Spark `Signum` accepts `DayTimeIntervalType` / `YearMonthIntervalType` via 
`inputTypes`; Comet only handles `DoubleType`. Either add an explicit fallback 
or extend the native path.
   
   ---
   
   Surfaced by the `audit-comet-expression` skill run in #4486.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to