andygrove opened a new pull request, #4721:
URL: https://github.com/apache/datafusion-comet/pull/4721

   ## Which issue does this PR close?
   
   Closes #4006.
   
   ## Rationale for this change
   
   Several Comet expressions have two execution paths: a JVM codegen-dispatch 
path that runs Spark's own generated code inside the Comet pipeline and is 
always Spark-compatible, and a faster native (Rust) implementation that differs 
from Spark for some inputs and is gated behind a config. Comet runs the 
compatible path by default, but a user has no way to discover, for a given 
query, that a faster native opt-in exists short of reading the compatibility 
guide. Comet also had only one way to annotate a plan node (a fallback reason), 
so there was no channel for a purely informational hint.
   
   This PR frames these expressions as "compatible by default, opt into native 
if you accept the documented differences" and surfaces that opt-in directly in 
the query plan.
   
   ## What changes are included in this PR?
   
   - A non-fallback informational channel: 
`CometSparkSessionExtensions.withInfo` records messages on a new 
`EXTENSION_INFO` tag, and verbose extended explain renders them as a 
`[COMET-INFO: ...]` segment, distinct from the `[COMET: ...]` fallback segment. 
`CometExecRule` rolls expression-level info up onto the operator node so it 
appears in explain output.
   - `SupportLevel.Compatible` gains an optional `nativeOptIn` field 
(`NativeOptIn(configKey)`) with a shared message builder so the docs and the 
runtime hint cannot drift. A `NativeOptInAvailable` marker trait (extended by 
`CodegenDispatchFallback`) is the single static signal used for docs detection.
   - `QueryPlanSerde` emits the hint centrally in the two existing dispatch 
branches, with no per-serde imperative calls. The hint shows only when enabling 
the config would actually switch that specific expression instance to native: 
RLike only with a literal pattern, RegExpReplace only with offset 1, 
date_format only for non-UTC sessions, and so on. A single shared predicate per 
input-dependent serde drives both `getSupportLevel` and `convert()`.
   - 11 expressions now advertise the opt-in: RLike, RegExpReplace, 
StringSplit, InitCap, Upper, Lower, StringReplace, GetJsonObject, 
LengthOfJsonArray, StructsToJson, JsonToStructs, and DateFormat. Their 
`getIncompatibleReasons` are restored so the compatibility guide documents the 
native differences users accept when they opt in.
   - `GenerateDocs` renders a compatible-by-default header for these 
expressions, keyed off the marker, and derives the gating config key (so 
Upper/Lower show `spark.comet.caseConversion.enabled`). The 
`getIncompatibleReasons` scaladoc is clarified, and a short "compatible by 
default, opt in to native" narrative is added to the compatibility index page.
   
   ## How are these changes tested?
   
   - New unit and explain tests in `CometExpressionSuite`: the info channel 
renders `[COMET-INFO]` without setting a fallback reason and accumulates rather 
than overwriting; the hint appears on the codegen-dispatch path (Hour on 
TimestampNTZ); per-instance precision is verified (RLike with a literal pattern 
shows the hint, a non-literal pattern does not; date_format shows it for a 
non-UTC session and is suppressed for UTC, where native already runs); the hint 
uses the dedicated `spark.comet.caseConversion.enabled` key for Upper.
   - Plan-stability golden files regenerated for Spark 3.5.
   
   ---
   
   Draft: this push uses `[skip ci]`. Plan-stability golden files for Spark 3.4 
/ 4.0 / 4.1 / 4.2 still need regenerating (the same `[COMET-INFO]` additions on 
q24a / q24b / q24 for `Upper`) and will be added before this is marked ready 
for review.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to