bmorck opened a new issue, #8170:
URL: https://github.com/apache/incubator-gluten/issues/8170
### Backend
VL (Velox)
### Bug description
[Expected behavior] and [actual behavior].
When trying to read parquet files using Gluten with Velox backend I get the
error `Reason: Failed to get S3 object due to: 'Network connection'.
Path:'s3://netflix-dataoven-test-users/bmorck/tpch_parquet/lineitem.parquet',
SDK Error Type:99, HTTP Status Code:-1, S3 Service:'Unknown',
Message:'curlCode: 77, Problem with the SSL CA cert (path? access rights?)',
RequestID:''.`
We are able to read properly without Gluten. We are using instance profile
directly on EC2.
I've seen similar threads suggest modifying the `spark.hadoop.fs.s3a` confs
but this doesn't seem to fix the issue. Any idea what might be going on?
### Spark version
Spark-3.3.x
### Spark configurations
spark.hadoop.fs.s3a.aws.credentials.provider = <internal credential provider>
spark.hadoop.fs.s3a.endpoint = s3-external-1.amazonaws.com
spark.hadoop.fs.s3a.use.instance.credentials = true
spark.hadoop.fs.s3a.connection.ssl.enabled = true
spark.hadoop.fs.s3a.path.style.access = false
### System information
_No response_
### Relevant logs
```bash
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Failed to get S3 object due to: 'Network connection'.
Path:'s3://<bucket>/bmorck/tpch_parquet/lineitem.parquet', SDK Error Type:99,
HTTP Status Code:-1, S3 Service:'Unknown', Message:'curlCode: 77, Problem with
the SSL CA cert (path? access rights?)', RequestID:''.
Retriable: False
Context: Split [Hive: s3a://<bucket>/bmorck/tpch_parquet/lineitem.parquet
15837691904 - 268435456] Task Gluten_Stage_8_TID_240_VTID_4
Additional Context: Operator: TableScan[0] 0
Function: preadInternal
File:
/root/src/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/connectors/hive/storage_adapters/s3fs/S3FileSystem.cpp
Line: 184
Stack trace:
# 0
# 1
# 2
# 3
# 4
# 5
# 6
# 7
# 8
# 9
# 10
# 11
# 12
# 13
# 14
# 15
# 16
# 17
# 18
# 19
# 20
# 21
# 22
at
org.apache.gluten.vectorized.GeneralOutIterator.hasNext(GeneralOutIterator.java:39)
at
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
at
org.apache.gluten.utils.iterator.IteratorsV1$InvocationFlowProtection.hasNext(IteratorsV1.scala:159)
at
org.apache.gluten.utils.iterator.IteratorsV1$IteratorCompleter.hasNext(IteratorsV1.scala:71)
at
org.apache.gluten.utils.iterator.IteratorsV1$PayloadCloser.hasNext(IteratorsV1.scala:37)
at
org.apache.gluten.utils.iterator.IteratorsV1$LifeTimeAccumulator.hasNext(IteratorsV1.scala:100)
at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator.isEmpty(Iterator.scala:387)
at scala.collection.Iterator.isEmpty$(Iterator.scala:387)
at
org.apache.spark.InterruptibleIterator.isEmpty(InterruptibleIterator.scala:28)
at
org.apache.gluten.execution.VeloxColumnarToRowExec$.toRowIterator(VeloxColumnarToRowExec.scala:108)
at
org.apache.gluten.execution.VeloxColumnarToRowExec.$anonfun$doExecuteInternal$1(VeloxColumnarToRowExec.scala:79)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:868)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:868)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:378)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:342)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:378)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:342)
at
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:568)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1537)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:571)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.gluten.exception.GlutenException: Exception:
VeloxRuntimeError
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]