LuciferYang opened a new pull request, #363: URL: https://github.com/apache/doris-spark-connector/pull/363
## What Add Spark 4.0.x support to the connector (JDK 17, Scala 2.13), plus the foundational changes needed for the shared `base` module to cross-build Scala 2.12 and 2.13. First of two stacked PRs for [#359](https://github.com/apache/doris-spark-connector/issues/359): - **This PR:** Spark 4.0.x + all foundational changes. - **Follow-up:** Spark 4.1.x on top of this. ## Changes **New modules** (mirroring the `spark-3-base` + thin `spark-3.x` layout): - `spark-doris-connector-spark-4-base` — shared Spark 4 DataSource V2 base (fork of `spark-3-base`). - `spark-doris-connector-spark-4.0` — Spark 4.0.0 / Scala 2.13.16 adapter. **Scala 2.13 readiness in `base`** (still builds on 2.12): - `DorisRow`: drop the `toSeq` override and inherit Spark's `Row.toSeq` default (avoids a forced copy on 2.13, and is more correct). - `DorisRelation` / `DorisArrowUtils` / `spark-4-base` `DorisTableBase`: build `StructType` from an `Array` instead of a mutable `Buffer`. - `TestStreamLoadForArrowType`: `parallelize` over a `List`. **Build / release wiring:** - root pom: `spark-4-base` + `spark-4.0` modules, dependencyManagement, and the `spark-4.0` profile (Scala 2.13.16). - `build.sh`: Scala 2.13 and Spark 4.0 options; require JDK 17 + Scala 2.13 for Spark 4.x. - `build-extension.yml`: a separate JDK 17 job providing a compile + shade gate for the Spark 4.x modules. - `deploy_staging_jars.sh`: stage-deploy `spark-4.0` under JDK 17 (`JAVA17_HOME`). ## Build / test Built locally on JDK 17: - `mvn -Pspark-4.0 -pl spark-doris-connector-spark-4.0 -am clean install` → shaded jar produced. - Regression: `mvn -Pspark-3.5 -pl spark-doris-connector-spark-3.5 -am clean install` (Scala 2.12) still green. ## Notes / deferred - The Spark 4.x CI step is a **compile + shade gate** (tests skipped). Running the 4.x unit/IT suites is deferred: `base`'s Arrow tests need Arrow allocation-manager alignment under Spark 4 (shaded Arrow 15 vs Spark 4's Arrow), and the IT module needs a Spark 4 run config. - The Arrow write path still uses Spark's internal `ArrowWriter` (unchanged from 3.x; pre-existing, out of scope here). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
