deepashreeraghu opened a new issue, #5963:
URL: https://github.com/apache/incubator-gluten/issues/5963
### Backend
VL (Velox)
### Bug description
[Expected behavior] - IT should honor s3a filesystem and be able to access
files.
[actual behavior] - It fails with the below error :
```
Reason: No registered file system matched with file path
's3a://perf-data-chstest1/
```
I am using the jar released at -
https://github.com/apache/incubator-gluten/releases/download/v1.1.1/gluten-velox-bundle-spark3.3_2.12-1.1.1.jar
### Spark version
Spark-3.3.x
### Spark configurations
```
"spark.hadoop.fs.s3a.endpoint":
"s3.direct.jp-tok.cloud-object-storage.appdomain.cloud",
"spark.hadoop.fs.s3a.access.key": "XXX",
"spark.hadoop.fs.s3a.secret.key": "XXX",
"spark.plugins": "io.glutenproject.GlutenPlugin",
"spark.memory.offHeap.enabled": "true",
"spark.memory.offHeap.size": "20g",
"spark.shuffle.manager":
"org.apache.spark.shuffle.sort.ColumnarShuffleManager"
```
### System information
_No response_
### Relevant logs
```bash
Reason: No registered file system matched with file path
's3a://perf-data-chstest1/rasika_data/parquet_db/customer/20230907_030906_00076_8uifi_60b91066-caf5-4356-abe1-b9fdab
c81a4a'
Retriable: False
Context: Split [Hive:
s3a://perf-data-chstest1/rasika_data/parquet_db/customer/20230907_030906_00076_8uifi_60b91066-caf5-4356-abe1-b9fdabc81a4a
0 - 134217728] Task Gluten_St
age_1_TID_1
Top-Level Context: Same as context.
Function: getFileSystem
File:
/root/src/oap-project/gluten/ep/build-velox/build/velox_ep/velox/common/file/FileSystems.cpp
Line: 61
Stack trace:
# 0 _ZN8facebook5velox7process10StackTraceC1Ei
# 1
_ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2
_ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKSsEEvRKNS1_18VeloxCheckFailArgsET0_
# 3
_ZN8facebook5velox11filesystems13getFileSystemESt17basic_string_viewIcSt11char_traitsIcEESt10shared_ptrIKNS0_6ConfigEE
# 4 _ZN8facebook5velox19FileHandleGeneratorclERKSs
# 5
_ZN8facebook5velox13CachedFactoryISsSt10shared_ptrINS0_10FileHandleEENS0_19FileHandleGeneratorEE8generateERKSs
# 6
_ZN8facebook5velox9connector4hive11SplitReader12prepareSplitESt10shared_ptrINS0_6common14MetadataFilterEERNS0_4dwio6common17RuntimeStatisticsE
# 7
_ZN8facebook5velox9connector4hive14HiveDataSource8addSplitESt10shared_ptrINS1_14ConnectorSplitEE
# 8 _ZN8facebook5velox4exec9TableScan9getOutputEv
# 9
_ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE
# 10
_ZN8facebook5velox4exec6Driver4nextERSt10shared_ptrINS1_13BlockingStateEE
# 11 _ZN8facebook5velox4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE
# 12 _ZN6gluten24WholeStageResultIterator4nextEv
# 13 Java_io_glutenproject_vectorized_ColumnarBatchOutIterator_nativeHasNext
# 14 ffi_call_unix64
# 15 ffi_call_int
# 16 _ZN32VM_BytecodeInterpreterCompressed3runEP10J9VMThread
# 17 bytecodeLoopCompressed
# 18 0x00000000001a3a72
at
io.glutenproject.vectorized.GeneralOutIterator.hasNext(GeneralOutIterator.java:39)
at
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
at
io.glutenproject.utils.InvocationFlowProtection.hasNext(Iterators.scala:135)
at
io.glutenproject.utils.IteratorCompleter.hasNext(Iterators.scala:69)
at io.glutenproject.utils.PayloadCloser.hasNext(Iterators.scala:35)
at
io.glutenproject.utils.PipelineTimeAccumulator.hasNext(Iterators.scala:98)
at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator.isEmpty(Iterator.scala:387)
at scala.collection.Iterator.isEmpty$(Iterator.scala:387)
at
org.apache.spark.InterruptibleIterator.isEmpty(InterruptibleIterator.scala:28)
at
io.glutenproject.execution.VeloxColumnarToRowExec$.toRowIterator(VeloxColumnarToRowExec.scala:116)
at
io.glutenproject.execution.VeloxColumnarToRowExec.$anonfun$doExecuteInternal$1(VeloxColumnarToRowExec.scala:80)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:855)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:855)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:839)
Caused by: java.lang.RuntimeException: Exception: VeloxRuntimeError
```
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]