Miles Granger created SPARK-45676:
-------------------------------------
Summary: Upgrade to PySpark 3.5.0 gives Class
org.apache.hadoop.fs.s3a.S3AFileSystem not found
Key: SPARK-45676
URL: https://issues.apache.org/jira/browse/SPARK-45676
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 3.5.0
Reporter: Miles Granger
Using PySpark 3.4.1 w/ the following dependencies works fine for reading S3
files:
hadoop-client:3.3.4
hadoop-common:3.3.4
hadoop-aws:3.3.4
aws-java-sdk-bundle:1.12.262
Doing a simple upgrade to PySpark 3.5.0 (which is still using hadoop 3.3.4
AFAIK) results in failing to read the same S3 files:
```
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
org.apache.hadoop.fs.s3a.S3AFileSystem not found
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2688)
at
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at
org.apache.parquet.hadoop.util.HadoopInputFile.fromStatus(HadoopInputFile.java:44)
at
org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:76)
at
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readParquetFootersInParallel$1(ParquetFileFormat.scala:450)
... 14 more
```
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]