[PR] build: Add spark-4.1 profile and shims [datafusion-comet]

via GitHub Mon, 02 Feb 2026 22:22:10 -0800


manuzhang opened a new pull request, #2829:
URL: https://github.com/apache/datafusion-comet/pull/2829


   ## Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and 
enhancements and this helps us generate change logs for our releases. You can 
link an issue to this PR using the GitHub syntax. For example `Closes #123` 
indicates that this PR will close issue #123.
   -->
   
   First step of #2792.
   
   ## Rationale for this change
   
   <!--
    Why are you proposing this change? If this is already explained clearly in 
the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your 
changes and offer better suggestions for fixes.
   -->
   
   ## What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is 
sometimes worth providing a summary of the individual changes in this PR.
   -->
   1. Add `spark-4.1` profile with minor shim version `spark-4.1`
   2. Move `src/main/spark-4.0` to `src/main/spark-4.x` for common shim classes 
in spark-4.0 and spark-4.1.
   3. Add `CometSumShim` and `ShimSQLConf` for spark-4.0 and spark-4.1 specific 
shims respectively
   4. Add `MapStatusBuilder.scala` to access 
`org.apache.spark.scheduler.MapStatus` in java. `MapStatus` has added a 
constructor argument in Spark 4.1, and only kept compatibility for scala codes.
   
   ---
   
   ### Summary of Changes in 4.1.1.diff (generated by Copilot)
   
   This patch updates the Comet project to support Spark 4.1. The changes 
largely involve build configuration updates, integration into Spark's session 
initialization, and extensive test exclusions for features not yet supported or 
behaving differently in Comet.
   
   #### Build Configuration
   - **pom.xml**:
     - Updated `spark.version.short` to `4.1`.
     - Added dependency for `comet-spark-spark4.1`.
   - **sql/core/pom.xml**:
     - Added `comet-spark-spark4.1` dependency to the SQL core module.
   
   #### Core Spark Integration
   - **SparkSession.scala**:
     - Added `isCometEnabled` check (defaults to true if `ENABLE_COMET` env var 
is not set or true).
     - Added `loadCometExtension` to automatically inject 
`org.apache.comet.CometSparkSessionExtensions` when enabling Comet.
   - **SparkPlanInfo.scala**:
     - Added support for `CometScanExec` to properly extract metadata for Spark 
UI/history.
   
   #### Test Infrastructure
   - **IgnoreComet.scala**:
     - Introduced new Scalatest tags: `IgnoreComet`, 
`IgnoreCometNativeIcebergCompat`, `IgnoreCometNativeDataFusion`, 
`IgnoreCometNativeScan`.
     - Added `IgnoreCometSuite` trait to disable entire test suites when Comet 
is enabled.
   
   #### Test Exclusions
   A large number of existing Spark SQL tests have been modified to skip when 
Comet is enabled. Common reasons cited in the diffs include:
   - **Unsupported Features**:
     - SubqueryBroadcastExec (#1737, #242)
     - Spill support in SortMergeJoin
     - Native CometSort peak execution memory updates
     - Datetime rebase mode
     - Parquet column index
     - RLE encoding, DELTA encoding
     - Variant types (issues/2209)
   - **Behavioral Differences**:
     - Explain output differences
     - Shuffle partition size and metrics changes
     - Exception handling (e.g. Cast context)
   - **Pending Fixes**:
     - References to various GitHub issues (e.g., #1948, #1947, #2218, #551).
   
   #### Modified Test Files
   The following logical areas of tests had significant exclusions:
   - Adaptive Query Execution (AQE)
   - Join Suites (DataFrameJoin, JoinHint, JoinSuite)
   - Parquet Data Source (Filter, Encoding, Query, Schema)
   - SQL Query execution and metrics
   - Collation support
   
   ## How are these changes tested?
   
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   5. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are 
they covered by existing tests)?
   -->
   Added UTs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] build: Add spark-4.1 profile and shims [datafusion-comet]

Reply via email to