squalud opened a new issue, #9365:
URL: https://github.com/apache/incubator-gluten/issues/9365
### Backend
VL (Velox)
### Bug description
I build gluten by source code using:
./dev/buildbundle-veloxbe.sh --enable_hdfs=ON --enable_s3=ON
--enable_vcpkg=ON --spark_version=3.5
After successfully build,i run pyspark with the following properties:
spark.driver.extraClassPath: 'xxxxx'
spark.executor.extraClassPath: 'xxxxx'
spark.gluten.enabled: true
spark.gluten.sql.columnar.forcescan: true
spark.gluten.sql.columnar.filescan: true
spark.gluten.sql.native.arrow.reader.enabled: true
spark.plugins: 'org.apache.gluten.GlutenPlugin'
spark.shuffle.manager: 'org.apache.spark.shuffle.sort.ColumnarShuffleManager'
spark.memory.offHeap.enabled: true
spark.memory.offHeap.size: 8g
I try to read CSV from S3 by using arrow's csv reader, then the following
error is reported:
SparkConnectGrpcException: (org.apache.spark.SparkException) Job aborted due
to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost
task 0.3 in stage 0.0 (TID 3) (xx.xx.xx.xx executor 1):
org.apache.gluten.exception.GlutenException:
org.apache.gluten.exception.GlutenException: Error during calling Java code
from native code: org.apache.gluten.exception.GlutenException:
org.apache.gluten.exception.GlutenException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Operator::getOutput failed for [operator: ValueStream, plan node ID:
0]: Error during calling Java code from native code:
java.lang.UnsatisfiedLinkError: /tmp/jnilib-17305424060615380389.tmp:
/tmp/jnilib-17305424060615380389.tmp: undefined symbol:
_ZNK3Aws2S38S3Client13CreateSessionERKNS0_5Model20CreateSessionRequestE
at java.base/jdk.internal.loader.NativeLibraries.load(Native Method)
at
java.base/jdk.internal.loader.NativeLibraries$NativeLibraryImpl.open(NativeLibraries.java:388)
at
java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:232)
at
java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:174)
at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2394)
at java.base/java.lang.Runtime.load0(Runtime.java:755)
at java.base/java.lang.System.load(System.java:1970)
at org.apache.arrow.dataset.jni.JniLoader.load(JniLoader.java:92)
at
org.apache.arrow.dataset.jni.JniLoader.loadRemaining(JniLoader.java:75)
at
org.apache.arrow.dataset.jni.JniLoader.ensureLoaded(JniLoader.java:61)
at
org.apache.arrow.dataset.jni.NativeMemoryPool.createListenable(NativeMemoryPool.java:44)
at
org.apache.gluten.memory.arrow.pool.ArrowNativeMemoryPool.<init>(ArrowNativeMemoryPool.java:34)
at
org.apache.gluten.memory.arrow.pool.ArrowNativeMemoryPool.createArrowNativeMemoryPool(ArrowNativeMemoryPool.java:47)
at
org.apache.gluten.memory.arrow.pool.ArrowNativeMemoryPool.lambda$arrowPool$0(ArrowNativeMemoryPool.java:42)
at
org.apache.spark.task.TaskResourceRegistry.$anonfun$addResourceIfNotRegistered$1(Task...
How can i work round?
### Gluten version
Gluten-1.3
### Spark version
Spark-3.5.x
### Spark configurations
spark.driver.extraClassPath: 'xxxxx'
spark.executor.extraClassPath: 'xxxxx'
spark.gluten.enabled: true
spark.gluten.sql.columnar.forcescan: true
spark.gluten.sql.columnar.filescan: true
spark.gluten.sql.native.arrow.reader.enabled: true
spark.plugins: 'org.apache.gluten.GlutenPlugin'
spark.shuffle.manager: 'org.apache.spark.shuffle.sort.ColumnarShuffleManager'
spark.memory.offHeap.enabled: true
spark.memory.offHeap.size: 8g
### System information
_No response_
### Relevant logs
```bash
SparkConnectGrpcException: (org.apache.spark.SparkException) Job aborted due
to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost
task 0.3 in stage 0.0 (TID 3) (xx.xx.xx.xx executor 1):
org.apache.gluten.exception.GlutenException:
org.apache.gluten.exception.GlutenException: Error during calling Java code
from native code: org.apache.gluten.exception.GlutenException:
org.apache.gluten.exception.GlutenException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Operator::getOutput failed for [operator: ValueStream, plan node ID:
0]: Error during calling Java code from native code:
java.lang.UnsatisfiedLinkError: /tmp/jnilib-17305424060615380389.tmp:
/tmp/jnilib-17305424060615380389.tmp: undefined symbol:
_ZNK3Aws2S38S3Client13CreateSessionERKNS0_5Model20CreateSessionRequestE
at java.base/jdk.internal.loader.NativeLibraries.load(Native Method)
at
java.base/jdk.internal.loader.NativeLibraries$NativeLibraryImpl.open(NativeLibraries.java:388)
at
java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:232)
at
java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:174)
at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2394)
at java.base/java.lang.Runtime.load0(Runtime.java:755)
at java.base/java.lang.System.load(System.java:1970)
at org.apache.arrow.dataset.jni.JniLoader.load(JniLoader.java:92)
at
org.apache.arrow.dataset.jni.JniLoader.loadRemaining(JniLoader.java:75)
at
org.apache.arrow.dataset.jni.JniLoader.ensureLoaded(JniLoader.java:61)
at
org.apache.arrow.dataset.jni.NativeMemoryPool.createListenable(NativeMemoryPool.java:44)
at
org.apache.gluten.memory.arrow.pool.ArrowNativeMemoryPool.<init>(ArrowNativeMemoryPool.java:34)
at
org.apache.gluten.memory.arrow.pool.ArrowNativeMemoryPool.createArrowNativeMemoryPool(ArrowNativeMemoryPool.java:47)
at
org.apache.gluten.memory.arrow.pool.ArrowNativeMemoryPool.lambda$arrowPool$0(ArrowNativeMemoryPool.java:42)
at
org.apache.spark.task.TaskResourceRegistry.$anonfun$addResourceIfNotRegistered$1(Task...
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]