beliefer opened a new issue, #10519:
URL: https://github.com/apache/incubator-gluten/issues/10519
### Backend
VL (Velox)
### Bug description
I build the `libhdfs3.so` with `dev/build_libhdfs3.sh` at branch 1.4.0.
I put it into the `${HADOOP_HOME}/lib/native/` and then remove `libhdfs.so`.
At finally, I renamed `libhdfs3.so` to `libhdfs.so`.
I found the error show below when I was submitting Spark SQL.
```
Spark master: yarn, Application Id: application_1731493565474_0482
Time taken: 1.675 seconds
org.apache.spark.SparkException: Job aborted due to stage failure:
Aborting TaskSet 3.0 because task 25 (partition 25)
cannot run anywhere due to node and executor excludeOnFailure.
Most recent failure:
Lost task 25.0 in stage 3.0 (TID 2030) (host1 executor 3):
org.apache.gluten.exception.GlutenException:
org.apache.gluten.exception.GlutenException: Error during calling Java code
from native code: org.apache.gluten.exception.GlutenException:
org.apache.gluten.exception.GlutenException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Unable to connect to HDFS: cluster, got error:
GetLastExceptionRootCause return null.
Retriable: False
Expression: hdfsClient_ != nullptr
Context: Split [Hive:
hdfs://cluster/user/hadoop/performance-datasets/tpcds/sf1000-parquet/useDecimal=false,useDate=false,filterNull=false/customer/part-00000-ffaa28bd-4482-4724-b6fc-ebde6b750cef-c000.snappy.parquet
104857600 - 4194304] Task Gluten_Stage_3_TID_2030_VTID_3
Additional Context: Operator: TableScan[0] 0
Function: Impl
File:
/home/hadoop/gluten/ep/build-velox/build/velox_ep/velox/connectors/hive/storage_adapters/hdfs/HdfsFileSystem.cpp
Line: 51
Stack trace:
# 0 _ZN8facebook5velox7process10StackTraceC1Ei
# 1
_ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2
_ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_18VeloxCheckFailArgsET0_
# 3
_ZN8facebook5velox11filesystems14HdfsFileSystem4ImplC1EPKNS0_6config10ConfigBaseERKNS1_19HdfsServiceEndpointE
# 4
_ZN8facebook5velox11filesystems14HdfsFileSystemC2ERKSt10shared_ptrIKNS0_6config10ConfigBaseEERKNS1_19HdfsServiceEndpointE
# 5
_ZN5folly15basic_once_flagINS_15SharedMutexImplILb0EvSt6atomicNS_24SharedMutexPolicyDefaultEEES2_E14call_once_slowIZZN8facebook5velox11filesystems23hdfsFileSystemGeneratorEvENKUlSt10shared_ptrIKNS8_6config10ConfigBaseEESt17basic_string_viewIcSt11char_traitsIcEEE_clESE_SI_EUlvE_JEEEvOT_DpOT0_
# 6
_ZZN8facebook5velox11filesystems23hdfsFileSystemGeneratorEvENKUlSt10shared_ptrIKNS0_6config10ConfigBaseEESt17basic_string_viewIcSt11char_traitsIcEEE_clES6_SA_.constprop.0
# 7
_ZNSt17_Function_handlerIFSt10shared_ptrIN8facebook5velox11filesystems10FileSystemEES0_IKNS2_6config10ConfigBaseEESt17basic_string_viewIcSt11char_traitsIcEEEZNS3_23hdfsFileSystemGeneratorEvEUlS9_SD_E_E9_M_invokeERKSt9_Any_dataOS9_OSD_
# 8
_ZN8facebook5velox11filesystems13getFileSystemESt17basic_string_viewIcSt11char_traitsIcEESt10shared_ptrIKNS0_6config10ConfigBaseEE
# 9
_ZN8facebook5velox19FileHandleGeneratorclERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPKNS0_14FilePropertiesEPNS0_11filesystems4File7IoStatsE
# 10
_ZN8facebook5velox13CachedFactoryINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS0_10FileHandleENS0_19FileHandleGeneratorENS0_14FilePropertiesENS0_11filesystems4File7IoStatsENS0_15FileHandleSizerESt8equal_toIS7_ESt4hashIS7_EE8generateERKS7_PKSA_PSD_
# 11 _ZN8facebook5velox9connector4hive11SplitReader12createReaderEv
# 12
_ZN8facebook5velox9connector4hive11SplitReader12prepareSplitESt10shared_ptrINS0_6common14MetadataFilterEERNS0_4dwio6common17RuntimeStatisticsE
# 13
_ZN8facebook5velox9connector4hive14HiveDataSource8addSplitESt10shared_ptrINS1_14ConnectorSplitEE
# 14 _ZN8facebook5velox4exec9TableScan9getOutputEv
# 15
_ZZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEEENKUlvE3_clEv
# 16
_ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE
# 17
_ZN8facebook5velox4exec6Driver4nextEPN5folly10SemiFutureINS3_4UnitEEERPNS1_8OperatorERNS1_14BlockingReasonE
# 18 _ZN8facebook5velox4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE
# 19 _ZN6gluten24WholeStageResultIterator4nextEv
# 20 Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeHasNext
# 21 0x00007f32e1017ab4
at
org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:41)
at
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
at
org.apache.gluten.iterator.IteratorsV1$InvocationFlowProtection.hasNext(IteratorsV1.scala:159)
at
org.apache.gluten.iterator.IteratorsV1$IteratorCompleter.hasNext(IteratorsV1.scala:71)
at
org.apache.gluten.iterator.IteratorsV1$PayloadCloser.hasNext(IteratorsV1.scala:37)
at
org.apache.gluten.iterator.IteratorsV1$LifeTimeAccumulator.hasNext(IteratorsV1.scala:100)
at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at
org.apache.gluten.iterator.IteratorsV1$ReadTimeAccumulator.hasNext(IteratorsV1.scala:127)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at
scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:32)
at
org.apache.gluten.vectorized.ColumnarBatchInIterator.hasNext(ColumnarBatchInIterator.java:36)
at
org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native
Method)
at
org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNext0(ColumnarBatchOutIterator.java:57)
at
org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:39)
at
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
at
org.apache.gluten.iterator.IteratorsV1$ReadTimeAccumulator.hasNext(IteratorsV1.scala:127)
at
org.apache.gluten.iterator.IteratorsV1$PayloadCloser.hasNext(IteratorsV1.scala:37)
at
org.apache.gluten.iterator.IteratorsV1$IteratorCompleter.hasNext(IteratorsV1.scala:71)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at
org.apache.spark.shuffle.ColumnarShuffleWriter.internalWrite(ColumnarShuffleWriter.scala:127)
at
org.apache.spark.shuffle.ColumnarShuffleWriter.write(ColumnarShuffleWriter.scala:257)
at
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:515)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1507)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:518)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
```
### Gluten version
_No response_
### Spark version
None
### Spark configurations
_No response_
### System information
_No response_
### Relevant logs
```bash
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]