Arseniy Tashoyan created SPARK-42425:
----------------------------------------
Summary: spark-hadoop-cloud is not provided in the default Spark
distribution
Key: SPARK-42425
URL: https://issues.apache.org/jira/browse/SPARK-42425
Project: Spark
Issue Type: Bug
Components: Input/Output
Affects Versions: 3.3.1
Reporter: Arseniy Tashoyan
The library spark-hadoop-cloud is absent in the default Spark distribution (as
well as its dependencies like hadoop-aws). Therefore the dependency management
section described in [Integration with Cloud
Infrastructures|https://spark.apache.org/docs/3.3.1/cloud-integration.html#installation]
is invalid. Actually the libraries for cloud integration are not provided.
A naive workaround would be to add the spark-hadoop-cloud library as a
compile-scope dependency. However, this does not work due to Spark classpath
hierarchy. Spark system classloader does not see classes loaded by the
application classloader.
Therefore a proper fix would be to enable the hadoop-cloud build profile by
default: -Phadoop-cloud
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]