Spark 3.0 and S3A

Nicholas Chammas Mon, 28 Oct 2019 08:35:24 -0700

Howdy folks,

I have a question about what is happening with the 3.0 release in relation
to Hadoop and hadoop-aws
<https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html>
.


Today, among other builds, we release a build of Spark built against Hadoop
2.7 and another one built without Hadoop. In Spark 3+, will we continue to
release Hadoop 2.7 builds as one of the primary downloads on the download
page <http://spark.apache.org/downloads.html>? Or will we start building
Spark against a newer version of Hadoop?

The reason I ask is because successive versions of hadoop-aws have made
significant usability improvements to S3A. To get those, users need to
download the Hadoop-free build of Spark
<https://spark.apache.org/docs/latest/hadoop-provided.html> and then link
Spark to a version of Hadoop newer than 2.7. There are various dependency
and runtime issues with trying to pair Spark built against Hadoop 2.7 with
hadoop-aws 2.8 or newer.

If we start releasing builds of Spark built against Hadoop 3.2 (or another
recent version), users can get the latest S3A improvements via --packages
"org.apache.hadoop:hadoop-aws:3.2.1" without needing to download Hadoop
separately.

Nick

Spark 3.0 and S3A

Reply via email to