sezruby opened a new pull request, #12226:
URL: https://github.com/apache/gluten/pull/12226

   ## What changes were proposed in this pull request?
   
   Extend `package/pom.xml`'s `org.apache.arrow` relocation excludes to also 
keep `org.apache.arrow.memory.**` and `org.apache.arrow.vector.**` unshaded.
   
   The bundled Arrow C-Data classes (`org.apache.arrow.c.*`) are correctly 
excluded from relocation because their native JNI binds to the original class 
names. However, their public API signatures take and return 
`org.apache.arrow.memory.*` and `org.apache.arrow.vector.*` types — which were 
being relocated. The result: the bundled `ArrowArrayStream` / `ArrowSchema` / 
`ArrowArray` / `Data` classes get compiled against the shaded `BufferAllocator` 
/ `VectorSchemaRoot`, so any caller passing a vanilla Apache Arrow allocator 
hits `NoSuchMethodError`.
   
   This affects any Spark workload that combines gluten with another library 
using Arrow C-Data (Iceberg's Arrow vector layer, Lance Java's writer, 
Snowflake JDBC's Arrow result decoder, etc.) when gluten's bundle wins 
classloader resolution against vanilla Arrow.
   
   ## How was this patch tested?
   
   Adds `dev/check-arrow-c-shading.sh` which runs `javap` on the produced 
bundle jar and asserts that public method signatures reference unshaded Arrow 
types. Wired into `package/pom.xml`'s `verify` phase via `exec-maven-plugin` so 
regressions are caught in CI.
   
   Tested against the upstream 
`gluten-velox-bundle-spark3.5_2.12-linux_amd64-1.6.0.jar`:
   
   ```
   $ dev/check-arrow-c-shading.sh 
/path/to/gluten-velox-bundle-spark3.5_2.12-linux_amd64-1.6.0.jar
     FAIL org/apache/arrow/c/ArrowArrayStream — public API references 
gluten-shaded Arrow types:
         public static org.apache.arrow.c.ArrowArrayStream allocateNew(
           org.apache.gluten.shaded.org.apache.arrow.memory.BufferAllocator);
     FAIL org/apache/arrow/c/ArrowSchema — public API references gluten-shaded 
Arrow types:
         public static org.apache.arrow.c.ArrowSchema allocateNew(
           org.apache.gluten.shaded.org.apache.arrow.memory.BufferAllocator);
     FAIL org/apache/arrow/c/ArrowArray — public API references gluten-shaded 
Arrow types:
         public static org.apache.arrow.c.ArrowArray allocateNew(
           org.apache.gluten.shaded.org.apache.arrow.memory.BufferAllocator);
     FAIL org/apache/arrow/c/Data — public API references gluten-shaded Arrow 
types:
         [16 methods touching shaded org.apache.arrow.memory/vector types]
   
   Bundle has 4 Arrow C-Data class(es) with shaded API types.
   exit code: 1
   ```
   
   After applying the relocation exclude change, a freshly-built bundle should 
pass the same check (script exits 0). The repro from #12225 (3 lines calling 
`ArrowArrayStream.allocateNew(new RootAllocator(...))` ) goes from 
`NoSuchMethodError` to `OK`.
   
   ## Closes
   
   #12225


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to