andygrove opened a new pull request, #4378: URL: https://github.com/apache/datafusion-comet/pull/4378
## Which issue does this PR close? Refs #4377. ## Rationale for this change `CometSparkSessionExtensions.isCometLoaded` runs from `CometScanRule._apply` and `CometExecRule._apply`, i.e. on every plan rule application. The warning added in #4328 therefore fires many times per query for any session that does not register `CometShuffleManager`. In the Spark SQL CI matrix this surfaces as hundreds of duplicate `WARN` lines per affected suite (e.g. `BroadcastJoinSuiteAE` emitted ~304 in a single 4.1.1 run). Whether those suites should be running with the Comet shuffle manager is a separate question tracked in #4377. This PR addresses the log-spam side of the regression so the warning is still useful as a once-per-session signal. ## What changes are included in this PR? - Add a synchronized weak set keyed by `SQLConf` in `CometSparkSessionExtensions`. - `isCometLoaded` now only emits the warning the first time it observes a given session missing the shuffle manager. Entries are dropped when the `SQLConf` is GC'd, so this cannot grow unboundedly across test JVMs. - Return value behavior is unchanged. ## How are these changes tested? - `CometSparkSessionExtensionsSuite` — both `isCometLoaded` tests still pass (they assert boolean return values, which are unchanged). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
