[I] java.lang.UnsatisfiedLinkError when reading CSV from S3 by arrow's csv reader [incubator-gluten]

via GitHub Fri, 18 Apr 2025 03:13:01 -0700


squalud opened a new issue, #9365:
URL: https://github.com/apache/incubator-gluten/issues/9365


   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   I build gluten by source code using:
   ./dev/buildbundle-veloxbe.sh --enable_hdfs=ON --enable_s3=ON 
--enable_vcpkg=ON --spark_version=3.5
   
   After successfully build，i run pyspark with the following properties：
   spark.driver.extraClassPath: 'xxxxx'
   spark.executor.extraClassPath: 'xxxxx'
   spark.gluten.enabled: true
   spark.gluten.sql.columnar.forcescan: true
   spark.gluten.sql.columnar.filescan: true
   spark.gluten.sql.native.arrow.reader.enabled: true
   spark.plugins: 'org.apache.gluten.GlutenPlugin'
   spark.shuffle.manager: 'org.apache.spark.shuffle.sort.ColumnarShuffleManager'
   spark.memory.offHeap.enabled: true
   spark.memory.offHeap.size: 8g
   
   
   I try to read CSV from S3 by using arrow's csv reader, then the following 
error is reported：
   
   SparkConnectGrpcException: (org.apache.spark.SparkException) Job aborted due 
to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost 
task 0.3 in stage 0.0 (TID 3) (xx.xx.xx.xx executor 1): 
org.apache.gluten.exception.GlutenException: 
org.apache.gluten.exception.GlutenException: Error during calling Java code 
from native code: org.apache.gluten.exception.GlutenException: 
org.apache.gluten.exception.GlutenException: Exception: VeloxRuntimeError
   Error Source: RUNTIME
   Error Code: INVALID_STATE
   Reason: Operator::getOutput failed for [operator: ValueStream, plan node ID: 
0]: Error during calling Java code from native code: 
java.lang.UnsatisfiedLinkError: /tmp/jnilib-17305424060615380389.tmp: 
/tmp/jnilib-17305424060615380389.tmp: undefined symbol: 
_ZNK3Aws2S38S3Client13CreateSessionERKNS0_5Model20CreateSessionRequestE
        at java.base/jdk.internal.loader.NativeLibraries.load(Native Method)
        at 
java.base/jdk.internal.loader.NativeLibraries$NativeLibraryImpl.open(NativeLibraries.java:388)
        at 
java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:232)
        at 
java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:174)
        at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2394)
        at java.base/java.lang.Runtime.load0(Runtime.java:755)
        at java.base/java.lang.System.load(System.java:1970)
        at org.apache.arrow.dataset.jni.JniLoader.load(JniLoader.java:92)
        at 
org.apache.arrow.dataset.jni.JniLoader.loadRemaining(JniLoader.java:75)
        at 
org.apache.arrow.dataset.jni.JniLoader.ensureLoaded(JniLoader.java:61)
        at 
org.apache.arrow.dataset.jni.NativeMemoryPool.createListenable(NativeMemoryPool.java:44)
        at 
org.apache.gluten.memory.arrow.pool.ArrowNativeMemoryPool.<init>(ArrowNativeMemoryPool.java:34)
        at 
org.apache.gluten.memory.arrow.pool.ArrowNativeMemoryPool.createArrowNativeMemoryPool(ArrowNativeMemoryPool.java:47)
        at 
org.apache.gluten.memory.arrow.pool.ArrowNativeMemoryPool.lambda$arrowPool$0(ArrowNativeMemoryPool.java:42)
        at 
org.apache.spark.task.TaskResourceRegistry.$anonfun$addResourceIfNotRegistered$1(Task...
   
   
   How can i work round?
   
   ### Gluten version
   
   Gluten-1.3
   
   ### Spark version
   
   Spark-3.5.x
   
   ### Spark configurations
   
   spark.driver.extraClassPath: 'xxxxx'
   spark.executor.extraClassPath: 'xxxxx'
   spark.gluten.enabled: true
   spark.gluten.sql.columnar.forcescan: true
   spark.gluten.sql.columnar.filescan: true
   spark.gluten.sql.native.arrow.reader.enabled: true
   spark.plugins: 'org.apache.gluten.GlutenPlugin'
   spark.shuffle.manager: 'org.apache.spark.shuffle.sort.ColumnarShuffleManager'
   spark.memory.offHeap.enabled: true
   spark.memory.offHeap.size: 8g
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   ```bash
   SparkConnectGrpcException: (org.apache.spark.SparkException) Job aborted due 
to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost 
task 0.3 in stage 0.0 (TID 3) (xx.xx.xx.xx executor 1): 
org.apache.gluten.exception.GlutenException: 
org.apache.gluten.exception.GlutenException: Error during calling Java code 
from native code: org.apache.gluten.exception.GlutenException: 
org.apache.gluten.exception.GlutenException: Exception: VeloxRuntimeError
   Error Source: RUNTIME
   Error Code: INVALID_STATE
   Reason: Operator::getOutput failed for [operator: ValueStream, plan node ID: 
0]: Error during calling Java code from native code: 
java.lang.UnsatisfiedLinkError: /tmp/jnilib-17305424060615380389.tmp: 
/tmp/jnilib-17305424060615380389.tmp: undefined symbol: 
_ZNK3Aws2S38S3Client13CreateSessionERKNS0_5Model20CreateSessionRequestE
        at java.base/jdk.internal.loader.NativeLibraries.load(Native Method)
        at 
java.base/jdk.internal.loader.NativeLibraries$NativeLibraryImpl.open(NativeLibraries.java:388)
        at 
java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:232)
        at 
java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:174)
        at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2394)
        at java.base/java.lang.Runtime.load0(Runtime.java:755)
        at java.base/java.lang.System.load(System.java:1970)
        at org.apache.arrow.dataset.jni.JniLoader.load(JniLoader.java:92)
        at 
org.apache.arrow.dataset.jni.JniLoader.loadRemaining(JniLoader.java:75)
        at 
org.apache.arrow.dataset.jni.JniLoader.ensureLoaded(JniLoader.java:61)
        at 
org.apache.arrow.dataset.jni.NativeMemoryPool.createListenable(NativeMemoryPool.java:44)
        at 
org.apache.gluten.memory.arrow.pool.ArrowNativeMemoryPool.<init>(ArrowNativeMemoryPool.java:34)
        at 
org.apache.gluten.memory.arrow.pool.ArrowNativeMemoryPool.createArrowNativeMemoryPool(ArrowNativeMemoryPool.java:47)
        at 
org.apache.gluten.memory.arrow.pool.ArrowNativeMemoryPool.lambda$arrowPool$0(ArrowNativeMemoryPool.java:42)
        at 
org.apache.spark.task.TaskResourceRegistry.$anonfun$addResourceIfNotRegistered$1(Task...
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] java.lang.UnsatisfiedLinkError when reading CSV from S3 by arrow's csv reader [incubator-gluten]

Reply via email to