[PR] [Feature] Support Spark 4.0 [doris-spark-connector]

via GitHub Thu, 18 Jun 2026 06:37:10 -0700


LuciferYang opened a new pull request, #363:
URL: https://github.com/apache/doris-spark-connector/pull/363


   ## What
   
   Add Spark 4.0.x support to the connector (JDK 17, Scala 2.13), plus the 
foundational
   changes needed for the shared `base` module to cross-build Scala 2.12 and 
2.13.
   
   First of two stacked PRs for 
[#359](https://github.com/apache/doris-spark-connector/issues/359):
   - **This PR:** Spark 4.0.x + all foundational changes.
   - **Follow-up:** Spark 4.1.x on top of this.
   
   ## Changes
   
   **New modules** (mirroring the `spark-3-base` + thin `spark-3.x` layout):
   - `spark-doris-connector-spark-4-base` — shared Spark 4 DataSource V2 base 
(fork of `spark-3-base`).
   - `spark-doris-connector-spark-4.0` — Spark 4.0.0 / Scala 2.13.16 adapter.
   
   **Scala 2.13 readiness in `base`** (still builds on 2.12):
   - `DorisRow`: drop the `toSeq` override and inherit Spark's `Row.toSeq` 
default (avoids a forced copy on 2.13, and is more correct).
   - `DorisRelation` / `DorisArrowUtils` / `spark-4-base` `DorisTableBase`: 
build `StructType` from an `Array` instead of a mutable `Buffer`.
   - `TestStreamLoadForArrowType`: `parallelize` over a `List`.
   
   **Build / release wiring:**
   - root pom: `spark-4-base` + `spark-4.0` modules, dependencyManagement, and 
the `spark-4.0` profile (Scala 2.13.16).
   - `build.sh`: Scala 2.13 and Spark 4.0 options; require JDK 17 + Scala 2.13 
for Spark 4.x.
   - `build-extension.yml`: a separate JDK 17 job providing a compile + shade 
gate for the Spark 4.x modules.
   - `deploy_staging_jars.sh`: stage-deploy `spark-4.0` under JDK 17 
(`JAVA17_HOME`).
   
   ## Build / test
   
   Built locally on JDK 17:
   - `mvn -Pspark-4.0 -pl spark-doris-connector-spark-4.0 -am clean install` → 
shaded jar produced.
   - Regression: `mvn -Pspark-3.5 -pl spark-doris-connector-spark-3.5 -am clean 
install` (Scala 2.12) still green.
   
   ## Notes / deferred
   
   - The Spark 4.x CI step is a **compile + shade gate** (tests skipped). 
Running the 4.x unit/IT suites is deferred: `base`'s Arrow tests need Arrow 
allocation-manager alignment under Spark 4 (shaded Arrow 15 vs Spark 4's 
Arrow), and the IT module needs a Spark 4 run config.
   - The Arrow write path still uses Spark's internal `ArrowWriter` (unchanged 
from 3.x; pre-existing, out of scope here).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [Feature] Support Spark 4.0 [doris-spark-connector]

Reply via email to