sezruby commented on PR #12226:
URL: https://github.com/apache/gluten/pull/12226#issuecomment-4628391086

   Followup status on the unbundling discussion:
   
   - [#12244](https://github.com/apache/gluten/pull/12244) (drop the 
`15.0.0-gluten` artifact rename + dead `modify_arrow_dataset_scan_option.patch` 
from the Arrow JVM build): open and CI-green. Lets non-ppc64le contributors 
build from Maven Central without running `dev/build-arrow.sh`. Doesn't change 
runtime/bundling.
   - [#12245](https://github.com/apache/gluten/pull/12245) (the actual unbundle 
— flip `arrow-memory-*` / `arrow-vector` to `scope=provided`, drop the 
`org.apache.arrow` shade-relocation block): closed.
   
   cc @zhztheplayer @FelixYBW
   
   CI on [#12245](https://github.com/apache/gluten/pull/12245) showed 
`spark-test-spark33` and `spark-test-spark34` failing. Root cause: bundled 
Arrow 15 is load-bearing for Spark < 3.5, because:
   
   - Spark 3.3.1 ships Arrow 7.0.0
   - Spark 3.4.4 ships Arrow 11.0.0
   - Spark 3.5.5 ships Arrow 15.0.0
   - Spark 4.0 / 4.1 ship Arrow 18.x
   
   Today gluten compiles against Arrow 15 and wins classloader resolution 
because its bundled copy is on `extraClassPath`. Strip the bundled copy and on 
Spark 3.3 / 3.4 only the older Arrow remains at runtime — `NoSuchMethodError`.
   
   Workarounds I considered:
   
   1. Per-Spark-profile `<arrow.version>` (3.3→7.0, 3.4→11.0, 3.5→15.0, 
4.x→18.1). Compiles, but means gluten on the Spark 3.3 profile is built against 
Arrow 7 — exactly the *"memory and vector APIs should be stable across minor 
versions / this sounds a real risk"* concern, now spanning an eight-version 
gap. Too much surface area without per-version testing.
   2. Conditional `<scope>` per Spark profile. Mechanical but ugly, leaves 
[#12225](https://github.com/apache/gluten/issues/12225) latent on 3.3 / 3.4.
   3. Drop Spark 3.3 / 3.4 support. Out of scope.
   
   [#12226](https://github.com/apache/gluten/pull/12226) already neutralized 
the immediate `NoSuchMethodError` from 
[#12225](https://github.com/apache/gluten/issues/12225) by un-shading the 
boundary types, so users on Spark 3.5+ are unblocked today. The full unbundling 
is a small diff (~3 poms) once gluten drops Spark 3.3 / 3.4 — happy to revisit 
it then.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to