[PR] [GH-2609] Support Spark 4.1 [sedona]

via GitHub Thu, 12 Feb 2026 11:43:16 -0800


jiayuasu opened a new pull request, #2649:
URL: https://github.com/apache/sedona/pull/2649


   ## Did you read the Contributor Guide?
   
   - Yes, I have read the [Contributor 
Rules](https://sedona.apache.org/latest/community/rule/) and [Contributor 
Development Guide](https://sedona.apache.org/latest/community/develop/)
   
   ## Is this PR related to a ticket?
   
   - Yes, and the PR name follows the format `[GH-XXX] my subject`. Closes #2609
   
   ## What changes were proposed in this PR?
   
   This PR adds support for Apache Spark 4.1 in Sedona.
   
   ### Build scaffolding
   - Added `sedona-spark-4.1` Maven profile in root `pom.xml` (Spark 4.1.0, 
Scala 2.13.17, Hadoop 3.4.1)
   - Added `spark-4.1` module entry in `spark/pom.xml` (`enable-all-submodules` 
profile)
   - Added `sedona-spark-4.1` profile in `spark/common/pom.xml` with 
`spark-sql-api` dependency
   - Created `spark/spark-4.1/` module - copied from `spark/spark-4.0/` and 
updated `artifactId`
   
   ### Differences from Spark 4.0 code
   
   The `spark/spark-4.1/` module is based on `spark/spark-4.0/` with the 
following differences:
   
   1. **`SedonaArrowEvalPythonExec.scala`**: Spark 4.1 added a new 
`sessionUUID` parameter to `ArrowPythonWithNamedArgumentRunner`. The Sedona 
wrapper adds `sessionUUID` initialization (from `session.sessionUUID` when 
`pythonWorkerLoggingEnabled`) and passes it through 
`SedonaArrowEvalPythonEvaluatorFactory` to `ArrowPythonWithNamedArgumentRunner`.
   
   2. **`pom.xml`**: `artifactId` changed from 
`sedona-spark-4.0_${scala.compat.version}` to 
`sedona-spark-4.1_${scala.compat.version}`.
   
   All other source files under `spark/spark-4.1/src/` are **identical** to 
their `spark/spark-4.0/` counterparts.
   
   ### Spark 4.1 API compatibility fixes in shared code (`spark/common/`)
   
   3. **`Functions.scala`** - **Geometry import ambiguity**: Spark 4.1 
introduces `org.apache.spark.sql.types.Geometry`, which conflicts with 
`org.locationtech.jts.geom.Geometry` when both packages are wildcard-imported. 
Fixed by adding an explicit `import org.locationtech.jts.geom.Geometry` after 
the wildcard imports - the explicit import shadows the wildcard and works for 
all Spark versions.
   
   4. **`ParquetColumnVector.java`** - **`WritableColumnVector.setAllNull()` 
removed**: Spark 4.1 replaced `setAllNull()` with `setMissing()`. Added a 
reflection-based `markAllNull()` helper that tries `setAllNull()` first, 
falling back to `setMissing()`. This maintains compatibility with Spark 3.x, 
4.0, and 4.1.
   
   ### Build profile details
   - Spark 4.1 drops Scala 2.12 support - only Scala 2.13 is supported
   - Scala version: 2.13.17 (matching Spark 4.1's own build)
   - Requires JDK 17 (same as Spark 4.0)
   
   ### Documentation updates
   - `docs/setup/maven-coordinates.md`: Added Spark 4.1 tabs for 
shaded/unshaded artifacts
   - `docs/setup/platform.md`: Added Spark 4.1 column to compatibility tables
   - `docs/community/publish.md`: Added Spark 4.1 to `SPARK_VERSIONS`, updated 
build scripts
   
   ### CI updates
   - `.github/workflows/java.yml`: Added Spark 4.1.0 matrix entry
   - `.github/workflows/example.yml`: Added Spark 4.1.0 matrix entry
   - `.github/workflows/python.yml`: Added Spark 4.1.0 matrix entries (Python 
3.10, 3.11)
   - `.github/workflows/docker-build.yml`: Added Spark 4.1.0 to matrix
   
   ## How was this patch tested?
   
   - Verified local `mvn clean package -Dspark=4.1 -Dscala=2.13 -DskipTests` 
succeeds with JDK 17
   - Verified `mvn clean package -Dspark=4.0 -Dscala=2.13 -DskipTests` still 
succeeds (no regression)
   - Verified `mvn clean package -Dspark=3.5 -DskipTests` still succeeds (no 
regression)
   
   ## Did this PR include necessary documentation updates?
   
   - Yes, I have updated the documentation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [GH-2609] Support Spark 4.1 [sedona]

Reply via email to