andygrove opened a new issue, #4616:
URL: https://github.com/apache/datafusion-comet/issues/4616

   ## Background
   
   Several expressions now have both a native (Rust) implementation and a JVM 
codegen-dispatch fallback (mixing in `CodegenDispatchFallback`). For these, the 
execution path is selected by a combination of the per-input support level and 
two config flags:
   
   - `spark.comet.expression.<Name>.allowIncompatible`
   - `spark.comet.exec.scalaUDF.codegen.enabled` (default `true`)
   
   This produces a routing matrix with five outcomes:
   
   | # | Support level for the input | `allowIncompatible` | `codegen.enabled` 
| Resulting path | What a test should assert |
   |---|---|---|---|---|---|
   | 1 | Compatible | any | any | native Rust | native + matches Spark |
   | 2 | Incompatible | true | any | native Rust | stays in Comet only (results 
may diverge) |
   | 3 | Incompatible | false | true (default) | JVM codegen dispatch | native 
+ matches Spark |
   | 4 | Incompatible | false | false | Spark fallback | fallback reason + 
matches Spark |
   | 5 | Unsupported | any | any | Spark fallback | fallback reason + matches 
Spark |
   
   ## Problem
   
   Coverage of this matrix is currently ad hoc and inconsistent across 
expressions. `date_format` in `CometTemporalExpressionSuite` covers the full 
trio for rows 2/3/4, but most other dual-impl expressions only assert the 
default dispatch path (row 3). The fallback-when-dispatcher-disabled path (row 
4) and the native-when-allowIncompatible path (row 2) are easy to drop 
silently, since a path-agnostic `checkSparkAnswer` passes regardless of which 
engine ran.
   
   This was observed while reviewing #4538: tightening the NTZ 
`hour`/`minute`/`second` and non-UTC `date_trunc` tests to assert the dispatch 
path inadvertently removed any assertion that those cases still fall back 
correctly when the dispatcher is disabled. That specific gap was fixed in 
#4538, but the same gap likely exists for the other expression families that 
route through the dispatcher (string, collection, map, json, etc.).
   
   ## Proposal
   
   Establish a consistent testing convention for dual-impl (native + 
codegen-dispatch) expressions so every such expression covers the applicable 
rows of the matrix above. Likely shape:
   
   - A small set of shared test helpers that encode the matrix once, for 
example:
     - `checkNativeCompatible(sql)` (row 1)
     - `checkStaysNativeWhenAllowIncompatible(sql, exprName)` (row 2, no 
equality)
     - `checkCodegenDispatch(sql)` (row 3)
     - `checkFallbackWhenCodegenDisabled(sql)` (row 4)
     - `checkUnsupportedFallback(sql, reason)` (row 5)
   - Per-expression suites call the relevant helpers, so the intended path for 
each input shape is explicit and a missing quadrant is obvious.
   
   Then retrofit the existing dual-impl expressions (those mixing in 
`CodegenDispatchFallback`) to use the convention.
   
   ## Scope notes
   
   - Out of scope for #4538, which only restored the matrix coverage for the 
temporal expressions it touched.
   - Follow-on to #4538: https://github.com/apache/datafusion-comet/pull/4538
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to