sezruby opened a new pull request, #12226:
URL: https://github.com/apache/gluten/pull/12226
## What changes were proposed in this pull request?
Extend `package/pom.xml`'s `org.apache.arrow` relocation excludes to also
keep `org.apache.arrow.memory.**` and `org.apache.arrow.vector.**` unshaded.
The bundled Arrow C-Data classes (`org.apache.arrow.c.*`) are correctly
excluded from relocation because their native JNI binds to the original class
names. However, their public API signatures take and return
`org.apache.arrow.memory.*` and `org.apache.arrow.vector.*` types — which were
being relocated. The result: the bundled `ArrowArrayStream` / `ArrowSchema` /
`ArrowArray` / `Data` classes get compiled against the shaded `BufferAllocator`
/ `VectorSchemaRoot`, so any caller passing a vanilla Apache Arrow allocator
hits `NoSuchMethodError`.
This affects any Spark workload that combines gluten with another library
using Arrow C-Data (Iceberg's Arrow vector layer, Lance Java's writer,
Snowflake JDBC's Arrow result decoder, etc.) when gluten's bundle wins
classloader resolution against vanilla Arrow.
## How was this patch tested?
Adds `dev/check-arrow-c-shading.sh` which runs `javap` on the produced
bundle jar and asserts that public method signatures reference unshaded Arrow
types. Wired into `package/pom.xml`'s `verify` phase via `exec-maven-plugin` so
regressions are caught in CI.
Tested against the upstream
`gluten-velox-bundle-spark3.5_2.12-linux_amd64-1.6.0.jar`:
```
$ dev/check-arrow-c-shading.sh
/path/to/gluten-velox-bundle-spark3.5_2.12-linux_amd64-1.6.0.jar
FAIL org/apache/arrow/c/ArrowArrayStream — public API references
gluten-shaded Arrow types:
public static org.apache.arrow.c.ArrowArrayStream allocateNew(
org.apache.gluten.shaded.org.apache.arrow.memory.BufferAllocator);
FAIL org/apache/arrow/c/ArrowSchema — public API references gluten-shaded
Arrow types:
public static org.apache.arrow.c.ArrowSchema allocateNew(
org.apache.gluten.shaded.org.apache.arrow.memory.BufferAllocator);
FAIL org/apache/arrow/c/ArrowArray — public API references gluten-shaded
Arrow types:
public static org.apache.arrow.c.ArrowArray allocateNew(
org.apache.gluten.shaded.org.apache.arrow.memory.BufferAllocator);
FAIL org/apache/arrow/c/Data — public API references gluten-shaded Arrow
types:
[16 methods touching shaded org.apache.arrow.memory/vector types]
Bundle has 4 Arrow C-Data class(es) with shaded API types.
exit code: 1
```
After applying the relocation exclude change, a freshly-built bundle should
pass the same check (script exits 0). The repro from #12225 (3 lines calling
`ArrowArrayStream.allocateNew(new RootAllocator(...))` ) goes from
`NoSuchMethodError` to `OK`.
## Closes
#12225
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]