andygrove opened a new pull request, #4509:
URL: https://github.com/apache/datafusion-comet/pull/4509

   ## Which issue does this PR close?
   
   Closes #4006.
   
   Depends on and is stacked on #4508 (the `withInfo` -> `withFallbackReason` 
rename). Because the two branches live on a fork, this PR targets `main` and 
therefore currently includes #4508's rename commit in its diff. Please review 
#4508 first; once it merges, rebase will reduce this PR to just the feature 
commits below.
   
   ## Rationale for this change
   
   Comet only had one way to tag a plan node with a message, and that message 
always meant "this node falls back to Spark". There was no way to attach a 
purely informational note that does not trigger fallback. This is increasingly 
useful with codegen dispatch: when Comet runs a JVM implementation of an 
expression even though a faster native implementation exists behind a config, 
we want to tell the user about the faster path without that note being treated 
as a fallback.
   
   ## What changes are included in this PR?
   
   - A new informational channel, parallel to the fallback channel freed up by 
#4508:
     - `CometSparkSessionExtensions.withInfo(node, message)` records a message 
on a new `CometExplainInfo.EXTENSION_INFO` tag. It does not cause fallback: no 
planning rule reads this tag.
     - Verbose extended explain renders these as a distinct `[COMET-INFO: ...]` 
segment, in addition to any `[COMET: ...]` fallback segment on the same node. 
The fallback explain list format is unchanged and still excludes info messages.
   - Expression-level info messages are lifted onto the converted operator node 
in `CometExecRule.convertToComet` (a single central rollup, applied to all 
native operators), because verbose explain only traverses plan nodes, not 
expressions.
   - First consumer: `CometDateFormat` emits a `[COMET-INFO: ...]` hint when a 
natively-supported format is requested but native execution is gated off 
(non-UTC session timezone with `allowIncompatible` disabled), so Comet runs the 
JVM codegen path. The hint names the exact config key to enable the faster 
native path.
   
   Known limitation for future work: the Spark 4.x `CometExprShim` node 
reconstruction copies `FALLBACK_REASONS` but not `EXTENSION_INFO` onto the 
wrapping `Invoke`. No current code path routes `withInfo` through those shims, 
so this is latent. It can be addressed if a future serde tags one of those 
reconstructed nodes.
   
   ## How are these changes tested?
   
   New tests in `CometExpressionSuite`:
   - `withInfo` does not set a fallback reason and renders as `[COMET-INFO: 
...]` in verbose explain, and a second message accumulates rather than 
overwriting.
   - `date_format` takes the JVM codegen path under a non-UTC timezone and 
surfaces the `[COMET-INFO: ...]` hint naming the 
`DateFormatClass.allowIncompatible` config key.
   
   The full `CometExpressionSuite` passes (125 succeeded), confirming the 
central `convertToComet` rollup does not regress operator conversion. 
`scalastyle:check` passes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to