sezruby commented on PR #12226: URL: https://github.com/apache/gluten/pull/12226#issuecomment-4628391086
Followup status on the unbundling discussion: - [#12244](https://github.com/apache/gluten/pull/12244) (drop the `15.0.0-gluten` artifact rename + dead `modify_arrow_dataset_scan_option.patch` from the Arrow JVM build): open and CI-green. Lets non-ppc64le contributors build from Maven Central without running `dev/build-arrow.sh`. Doesn't change runtime/bundling. - [#12245](https://github.com/apache/gluten/pull/12245) (the actual unbundle — flip `arrow-memory-*` / `arrow-vector` to `scope=provided`, drop the `org.apache.arrow` shade-relocation block): closed. cc @zhztheplayer @FelixYBW CI on [#12245](https://github.com/apache/gluten/pull/12245) showed `spark-test-spark33` and `spark-test-spark34` failing. Root cause: bundled Arrow 15 is load-bearing for Spark < 3.5, because: - Spark 3.3.1 ships Arrow 7.0.0 - Spark 3.4.4 ships Arrow 11.0.0 - Spark 3.5.5 ships Arrow 15.0.0 - Spark 4.0 / 4.1 ship Arrow 18.x Today gluten compiles against Arrow 15 and wins classloader resolution because its bundled copy is on `extraClassPath`. Strip the bundled copy and on Spark 3.3 / 3.4 only the older Arrow remains at runtime — `NoSuchMethodError`. Workarounds I considered: 1. Per-Spark-profile `<arrow.version>` (3.3→7.0, 3.4→11.0, 3.5→15.0, 4.x→18.1). Compiles, but means gluten on the Spark 3.3 profile is built against Arrow 7 — exactly the *"memory and vector APIs should be stable across minor versions / this sounds a real risk"* concern, now spanning an eight-version gap. Too much surface area without per-version testing. 2. Conditional `<scope>` per Spark profile. Mechanical but ugly, leaves [#12225](https://github.com/apache/gluten/issues/12225) latent on 3.3 / 3.4. 3. Drop Spark 3.3 / 3.4 support. Out of scope. [#12226](https://github.com/apache/gluten/pull/12226) already neutralized the immediate `NoSuchMethodError` from [#12225](https://github.com/apache/gluten/issues/12225) by un-shading the boundary types, so users on Spark 3.5+ are unblocked today. The full unbundling is a small diff (~3 poms) once gluten drops Spark 3.3 / 3.4 — happy to revisit it then. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
