and @tgra...@apache.org <tgra...@apache.org> too On Sat, 11 Dec 2021 at 21:38, Hyukjin Kwon <gurwls...@gmail.com> wrote:
> cc @Holden Karau <holden.ka...@gmail.com> @DB Tsai <dbt...@dbtsai.com> @Imran > Rashid <iras...@apache.org> @Mridul Muralidharan <mri...@gmail.com> FYI > > On Thu, 9 Dec 2021 at 14:07, angers zhu <angers....@gmail.com> wrote: > >> Hi all, >> >> Since Spark 3.2, we have supported Hadoop 3.3.1 now, but its profile name >> is *hadoop-3.2* (and *hadoop-2.7*) that is not correct. >> So we made a change in https://github.com/apache/spark/pull/34715 >> Starting from Spark 3.3, we use hadoop profile *hadoop-2* and *hadoop-3 *, >> and default hadoop profile is hadoop-3. >> Profile changes >> >> *hadoop-2.7* changed to *hadoop-2* >> *hadoop-3.2* changed to *hadoop-3* >> Release tar file >> >> Spark-3.3.0 with profile hadoop-3: *spark-3.3.0-bin-hadoop3.tgz* >> Spark-3.3.0 with profile hadoop-2: *spark-3.3.0-bin-hadoop2.tgz* >> >> For Spark 3.2.0, the release tar file was, for example, >> *spark-3.2.0-bin-hadoop3.2.tgz*. >> Pip install option changes >> >> For PySpark with/without a specific Hadoop version, you can install it by >> using PYSPARK_HADOOP_VERSION environment variables as below (Hadoop 3): >> >> PYSPARK_HADOOP_VERSION=3 pip install pyspark >> >> For Hadoop 2: >> >> PYSPARK_HADOOP_VERSION=2 pip install pyspark >> >> Supported values in PYSPARK_HADOOP_VERSION are now: >> >> - without: Spark pre-built with user-provided Apache Hadoop >> - 2: Spark pre-built for Apache Hadoop 2. >> - 3: Spark pre-built for Apache Hadoop 3.3 and later (default) >> >> Building Spark and Specifying the Hadoop Version >> <https://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn> >> >> You can specify the exact version of Hadoop to compile against through >> the hadoop.version property. >> Example: >> >> ./build/mvn -Pyarn -Dhadoop.version=3.3.0 -DskipTests clean package >> >> or you can specify *hadoop-3* profile >> >> ./build/mvn -Pyarn -Phadoop-3 -Dhadoop.version=3.3.0 -DskipTests clean >> package >> >> If you want to build with Hadoop 2.x, enable *hadoop-2* profile: >> >> ./build/mvn -Phadoop-2 -Pyarn -Dhadoop.version=2.8.5 -DskipTests clean >> package >> >> Notes >> >> In the current master, it will use the default Hadoop 3 if you continue >> to use -Phadoop-2.7 and -Phadoop-3.2 to build Spark >> because Maven or SBT will just warn and ignore these non-existent >> profiles. >> Please change profiles to -Phadoop-2 or -Phadoop-3. >> >