This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new b6f46ca [SPARK-33212][BUILD] Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile b6f46ca is described below commit b6f46ca29742029efea2790af7fdefbc2fcf52de Author: Chao Sun <sunc...@apple.com> AuthorDate: Fri Jan 15 14:06:50 2021 -0800 [SPARK-33212][BUILD] Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile ### What changes were proposed in this pull request? This: 1. switches Spark to use shaded Hadoop clients, namely hadoop-client-api and hadoop-client-runtime, for Hadoop 3.x. 2. upgrade built-in version for Hadoop 3.x to Hadoop 3.2.2 Note that for Hadoop 2.7, we'll still use the same modules such as hadoop-client. In order to still keep default Hadoop profile to be hadoop-3.2, this defines the following Maven properties: ``` hadoop-client-api.artifact hadoop-client-runtime.artifact hadoop-client-minicluster.artifact ``` which default to: ``` hadoop-client-api hadoop-client-runtime hadoop-client-minicluster ``` but all switch to `hadoop-client` when the Hadoop profile is hadoop-2.7. A side affect from this is we'll import the same dependency multiple times. For this I have to disable Maven enforcer `banDuplicatePomDependencyVersions`. Besides above, there are the following changes: - explicitly add a few dependencies which are imported via transitive dependencies from Hadoop jars, but are removed from the shaded client jars. - removed the use of `ProxyUriUtils.getPath` from `ApplicationMaster` which is a server-side/private API. - modified `IsolatedClientLoader` to exclude `hadoop-auth` jars when Hadoop version is 3.x. This change should only matter when we're not sharing Hadoop classes with Spark (which is _mostly_ used in tests). ### Why are the changes needed? Hadoop 3.2.2 is released with new features and bug fixes, so it's good for the Spark community to adopt it. However, latest Hadoop versions starting from Hadoop 3.2.1 have upgraded to use Guava 27+. In order to resolve Guava conflicts, this takes the approach by switching to shaded client jars provided by Hadoop. This also has the benefits of avoid pulling other 3rd party dependencies from Hadoop side so as to avoid more potential future conflicts. ### Does this PR introduce _any_ user-facing change? When people use Spark with `hadoop-provided` option, they should make sure class path contains `hadoop-client-api` and `hadoop-client-runtime` jars. In addition, they may need to make sure these jars appear before other Hadoop jars in the order. Otherwise, classes may be loaded from the other non-shaded Hadoop jars and cause potential conflicts. ### How was this patch tested? Relying on existing tests. Closes #30701 from sunchao/test-hadoop-3.2.2. Authored-by: Chao Sun <sunc...@apple.com> Signed-off-by: Dongjoon Hyun <dh...@apple.com> --- common/network-yarn/pom.xml | 8 ++- core/pom.xml | 16 +++++- dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 3 +- dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 53 +---------------- external/kafka-0-10-assembly/pom.xml | 8 ++- external/kafka-0-10-sql/pom.xml | 4 ++ external/kafka-0-10-token-provider/pom.xml | 5 ++ external/kinesis-asl-assembly/pom.xml | 8 ++- hadoop-cloud/pom.xml | 7 ++- launcher/pom.xml | 9 ++- pom.xml | 59 +++++++++++++++---- resource-managers/kubernetes/core/pom.xml | 9 --- resource-managers/yarn/pom.xml | 67 ++++++++++++++-------- .../spark/deploy/yarn/ApplicationMaster.scala | 6 +- .../spark/deploy/yarn/BaseYarnClusterSuite.scala | 10 ++++ sql/catalyst/pom.xml | 4 ++ sql/hive/pom.xml | 5 ++ .../sql/hive/client/IsolatedClientLoader.scala | 19 +++++- 18 files changed, 191 insertions(+), 109 deletions(-) diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 7aff79e..5036e05 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -65,7 +65,13 @@ <!-- Provided dependencies --> <dependency> <groupId>org.apache.hadoop</groupId> - <artifactId>hadoop-client</artifactId> + <artifactId>${hadoop-client-api.artifact}</artifactId> + <version>${hadoop.version}</version> + </dependency> + <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>${hadoop-client-runtime.artifact}</artifactId> + <version>${hadoop.version}</version> </dependency> <dependency> <groupId>org.slf4j</groupId> diff --git a/core/pom.xml b/core/pom.xml index 09fa153..4b3e040 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -66,7 +66,13 @@ </dependency> <dependency> <groupId>org.apache.hadoop</groupId> - <artifactId>hadoop-client</artifactId> + <artifactId>${hadoop-client-api.artifact}</artifactId> + <version>${hadoop.version}</version> + </dependency> + <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>${hadoop-client-runtime.artifact}</artifactId> + <version>${hadoop.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> @@ -178,6 +184,14 @@ <artifactId>commons-text</artifactId> </dependency> <dependency> + <groupId>commons-io</groupId> + <artifactId>commons-io</artifactId> + </dependency> + <dependency> + <groupId>commons-collections</groupId> + <artifactId>commons-collections</artifactId> + </dependency> + <dependency> <groupId>com.google.code.findbugs</groupId> <artifactId>jsr305</artifactId> </dependency> diff --git a/dev/deps/spark-deps-hadoop-2.7-hive-2.3 b/dev/deps/spark-deps-hadoop-2.7-hive-2.3 index 8d8ef2e..caede04 100644 --- a/dev/deps/spark-deps-hadoop-2.7-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-2.7-hive-2.3 @@ -128,7 +128,7 @@ javassist/3.25.0-GA//javassist-3.25.0-GA.jar javax.inject/1//javax.inject-1.jar javax.jdo/3.2.0-m3//javax.jdo-3.2.0-m3.jar javolution/5.5.1//javolution-5.5.1.jar -jaxb-api/2.2.2//jaxb-api-2.2.2.jar +jaxb-api/2.2.11//jaxb-api-2.2.11.jar jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar jcl-over-slf4j/1.7.30//jcl-over-slf4j-1.7.30.jar jdo-api/3.0.1//jdo-api-3.0.1.jar @@ -227,7 +227,6 @@ spire-macros_2.12/0.17.0-M1//spire-macros_2.12-0.17.0-M1.jar spire-platform_2.12/0.17.0-M1//spire-platform_2.12-0.17.0-M1.jar spire-util_2.12/0.17.0-M1//spire-util_2.12-0.17.0-M1.jar spire_2.12/0.17.0-M1//spire_2.12-0.17.0-M1.jar -stax-api/1.0-2//stax-api-1.0-2.jar stax-api/1.0.1//stax-api-1.0.1.jar stream/2.9.6//stream-2.9.6.jar super-csv/2.2.0//super-csv-2.2.0.jar diff --git a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 b/dev/deps/spark-deps-hadoop-3.2-hive-2.3 index bf56fc1..344d8e5 100644 --- a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3.2-hive-2.3 @@ -3,7 +3,6 @@ JLargeArrays/1.5//JLargeArrays-1.5.jar JTransforms/3.1//JTransforms-3.1.jar RoaringBitmap/0.9.0//RoaringBitmap-0.9.0.jar ST4/4.0.4//ST4-4.0.4.jar -accessors-smart/1.2//accessors-smart-1.2.jar activation/1.1.1//activation-1.1.1.jar aircompressor/0.16//aircompressor-0.16.jar algebra_2.12/2.0.0-M2//algebra_2.12-2.0.0-M2.jar @@ -11,7 +10,6 @@ annotations/17.0.0//annotations-17.0.0.jar antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar antlr4-runtime/4.8-1//antlr4-runtime-4.8-1.jar aopalliance-repackaged/2.6.1//aopalliance-repackaged-2.6.1.jar -aopalliance/1.0//aopalliance-1.0.jar arpack_combined_all/0.1//arpack_combined_all-0.1.jar arrow-format/2.0.0//arrow-format-2.0.0.jar arrow-memory-core/2.0.0//arrow-memory-core-2.0.0.jar @@ -28,15 +26,12 @@ breeze_2.12/1.0//breeze_2.12-1.0.jar cats-kernel_2.12/2.0.0-M4//cats-kernel_2.12-2.0.0-M4.jar chill-java/0.9.5//chill-java-0.9.5.jar chill_2.12/0.9.5//chill_2.12-0.9.5.jar -commons-beanutils/1.9.4//commons-beanutils-1.9.4.jar commons-cli/1.2//commons-cli-1.2.jar commons-codec/1.15//commons-codec-1.15.jar commons-collections/3.2.2//commons-collections-3.2.2.jar commons-compiler/3.0.16//commons-compiler-3.0.16.jar commons-compress/1.20//commons-compress-1.20.jar -commons-configuration2/2.1.1//commons-configuration2-2.1.1.jar commons-crypto/1.1.0//commons-crypto-1.1.0.jar -commons-daemon/1.0.13//commons-daemon-1.0.13.jar commons-dbcp/1.4//commons-dbcp-1.4.jar commons-httpclient/3.1//commons-httpclient-3.1.jar commons-io/2.5//commons-io-2.5.jar @@ -56,30 +51,13 @@ datanucleus-api-jdo/4.2.4//datanucleus-api-jdo-4.2.4.jar datanucleus-core/4.1.17//datanucleus-core-4.1.17.jar datanucleus-rdbms/4.1.19//datanucleus-rdbms-4.1.19.jar derby/10.14.2.0//derby-10.14.2.0.jar -dnsjava/2.1.7//dnsjava-2.1.7.jar dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar -ehcache/3.3.1//ehcache-3.3.1.jar flatbuffers-java/1.9.0//flatbuffers-java-1.9.0.jar generex/1.0.2//generex-1.0.2.jar -geronimo-jcache_1.0_spec/1.0-alpha-1//geronimo-jcache_1.0_spec-1.0-alpha-1.jar gson/2.2.4//gson-2.2.4.jar guava/14.0.1//guava-14.0.1.jar -guice-servlet/4.0//guice-servlet-4.0.jar -guice/4.0//guice-4.0.jar -hadoop-annotations/3.2.0//hadoop-annotations-3.2.0.jar -hadoop-auth/3.2.0//hadoop-auth-3.2.0.jar -hadoop-client/3.2.0//hadoop-client-3.2.0.jar -hadoop-common/3.2.0//hadoop-common-3.2.0.jar -hadoop-hdfs-client/3.2.0//hadoop-hdfs-client-3.2.0.jar -hadoop-mapreduce-client-common/3.2.0//hadoop-mapreduce-client-common-3.2.0.jar -hadoop-mapreduce-client-core/3.2.0//hadoop-mapreduce-client-core-3.2.0.jar -hadoop-mapreduce-client-jobclient/3.2.0//hadoop-mapreduce-client-jobclient-3.2.0.jar -hadoop-yarn-api/3.2.0//hadoop-yarn-api-3.2.0.jar -hadoop-yarn-client/3.2.0//hadoop-yarn-client-3.2.0.jar -hadoop-yarn-common/3.2.0//hadoop-yarn-common-3.2.0.jar -hadoop-yarn-registry/3.2.0//hadoop-yarn-registry-3.2.0.jar -hadoop-yarn-server-common/3.2.0//hadoop-yarn-server-common-3.2.0.jar -hadoop-yarn-server-web-proxy/3.2.0//hadoop-yarn-server-web-proxy-3.2.0.jar +hadoop-client-api/3.2.2//hadoop-client-api-3.2.2.jar +hadoop-client-runtime/3.2.2//hadoop-client-runtime-3.2.2.jar hive-beeline/2.3.7//hive-beeline-2.3.7.jar hive-cli/2.3.7//hive-cli-2.3.7.jar hive-common/2.3.7//hive-common-2.3.7.jar @@ -109,8 +87,6 @@ jackson-core/2.11.4//jackson-core-2.11.4.jar jackson-databind/2.11.4//jackson-databind-2.11.4.jar jackson-dataformat-yaml/2.11.4//jackson-dataformat-yaml-2.11.4.jar jackson-datatype-jsr310/2.11.2//jackson-datatype-jsr310-2.11.2.jar -jackson-jaxrs-base/2.9.5//jackson-jaxrs-base-2.9.5.jar -jackson-jaxrs-json-provider/2.9.5//jackson-jaxrs-json-provider-2.9.5.jar jackson-mapper-asl/1.9.13//jackson-mapper-asl-1.9.13.jar jackson-module-jaxb-annotations/2.11.4//jackson-module-jaxb-annotations-2.11.4.jar jackson-module-paranamer/2.11.4//jackson-module-paranamer-2.11.4.jar @@ -124,13 +100,10 @@ jakarta.ws.rs-api/2.1.6//jakarta.ws.rs-api-2.1.6.jar jakarta.xml.bind-api/2.3.2//jakarta.xml.bind-api-2.3.2.jar janino/3.0.16//janino-3.0.16.jar javassist/3.25.0-GA//javassist-3.25.0-GA.jar -javax.inject/1//javax.inject-1.jar javax.jdo/3.2.0-m3//javax.jdo-3.2.0-m3.jar -javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar javolution/5.5.1//javolution-5.5.1.jar jaxb-api/2.2.11//jaxb-api-2.2.11.jar jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar -jcip-annotations/1.0-1//jcip-annotations-1.0-1.jar jcl-over-slf4j/1.7.30//jcl-over-slf4j-1.7.30.jar jdo-api/3.0.1//jdo-api-3.0.1.jar jersey-client/2.30//jersey-client-2.30.jar @@ -144,30 +117,14 @@ jline/2.14.6//jline-2.14.6.jar joda-time/2.10.5//joda-time-2.10.5.jar jodd-core/3.5.2//jodd-core-3.5.2.jar jpam/1.1//jpam-1.1.jar -json-smart/2.3//json-smart-2.3.jar json/1.8//json-1.8.jar json4s-ast_2.12/3.7.0-M5//json4s-ast_2.12-3.7.0-M5.jar json4s-core_2.12/3.7.0-M5//json4s-core_2.12-3.7.0-M5.jar json4s-jackson_2.12/3.7.0-M5//json4s-jackson_2.12-3.7.0-M5.jar json4s-scalap_2.12/3.7.0-M5//json4s-scalap_2.12-3.7.0-M5.jar -jsp-api/2.1//jsp-api-2.1.jar jsr305/3.0.0//jsr305-3.0.0.jar jta/1.1//jta-1.1.jar jul-to-slf4j/1.7.30//jul-to-slf4j-1.7.30.jar -kerb-admin/1.0.1//kerb-admin-1.0.1.jar -kerb-client/1.0.1//kerb-client-1.0.1.jar -kerb-common/1.0.1//kerb-common-1.0.1.jar -kerb-core/1.0.1//kerb-core-1.0.1.jar -kerb-crypto/1.0.1//kerb-crypto-1.0.1.jar -kerb-identity/1.0.1//kerb-identity-1.0.1.jar -kerb-server/1.0.1//kerb-server-1.0.1.jar -kerb-simplekdc/1.0.1//kerb-simplekdc-1.0.1.jar -kerb-util/1.0.1//kerb-util-1.0.1.jar -kerby-asn1/1.0.1//kerby-asn1-1.0.1.jar -kerby-config/1.0.1//kerby-config-1.0.1.jar -kerby-pkix/1.0.1//kerby-pkix-1.0.1.jar -kerby-util/1.0.1//kerby-util-1.0.1.jar -kerby-xdr/1.0.1//kerby-xdr-1.0.1.jar kryo-shaded/4.0.2//kryo-shaded-4.0.2.jar kubernetes-client/4.12.0//kubernetes-client-4.12.0.jar kubernetes-model-admissionregistration/4.12.0//kubernetes-model-admissionregistration-4.12.0.jar @@ -205,9 +162,7 @@ metrics-json/4.1.1//metrics-json-4.1.1.jar metrics-jvm/4.1.1//metrics-jvm-4.1.1.jar minlog/1.3.0//minlog-1.3.0.jar netty-all/4.1.51.Final//netty-all-4.1.51.Final.jar -nimbus-jose-jwt/4.41.1//nimbus-jose-jwt-4.41.1.jar objenesis/2.6//objenesis-2.6.jar -okhttp/2.7.5//okhttp-2.7.5.jar okhttp/3.12.12//okhttp-3.12.12.jar okio/1.14.0//okio-1.14.0.jar opencsv/2.3//opencsv-2.3.jar @@ -226,7 +181,6 @@ parquet-jackson/1.10.1//parquet-jackson-1.10.1.jar protobuf-java/2.5.0//protobuf-java-2.5.0.jar py4j/0.10.9.1//py4j-0.10.9.1.jar pyrolite/4.30//pyrolite-4.30.jar -re2j/1.1//re2j-1.1.jar scala-collection-compat_2.12/2.1.1//scala-collection-compat_2.12-2.1.1.jar scala-compiler/2.12.10//scala-compiler-2.12.10.jar scala-library/2.12.10//scala-library-2.12.10.jar @@ -244,15 +198,12 @@ spire-platform_2.12/0.17.0-M1//spire-platform_2.12-0.17.0-M1.jar spire-util_2.12/0.17.0-M1//spire-util_2.12-0.17.0-M1.jar spire_2.12/0.17.0-M1//spire_2.12-0.17.0-M1.jar stax-api/1.0.1//stax-api-1.0.1.jar -stax2-api/3.1.4//stax2-api-3.1.4.jar stream/2.9.6//stream-2.9.6.jar super-csv/2.2.0//super-csv-2.2.0.jar threeten-extra/1.5.0//threeten-extra-1.5.0.jar -token-provider/1.0.1//token-provider-1.0.1.jar transaction-api/1.1//transaction-api-1.1.jar univocity-parsers/2.9.0//univocity-parsers-2.9.0.jar velocity/1.5//velocity-1.5.jar -woodstox-core/5.0.3//woodstox-core-5.0.3.jar xbean-asm7-shaded/4.15//xbean-asm7-shaded-4.15.jar xz/1.5//xz-1.5.jar zjsonpatch/0.3.0//zjsonpatch-0.3.0.jar diff --git a/external/kafka-0-10-assembly/pom.xml b/external/kafka-0-10-assembly/pom.xml index 2359e99..121bc56 100644 --- a/external/kafka-0-10-assembly/pom.xml +++ b/external/kafka-0-10-assembly/pom.xml @@ -71,10 +71,16 @@ </dependency> <dependency> <groupId>org.apache.hadoop</groupId> - <artifactId>hadoop-client</artifactId> + <artifactId>${hadoop-client-api.artifact}</artifactId> + <version>${hadoop.version}</version> <scope>provided</scope> </dependency> <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>${hadoop-client-runtime.artifact}</artifactId> + <version>${hadoop.version}</version> + </dependency> + <dependency> <groupId>org.apache.avro</groupId> <artifactId>avro-mapred</artifactId> <classifier>${avro.mapred.classifier}</classifier> diff --git a/external/kafka-0-10-sql/pom.xml b/external/kafka-0-10-sql/pom.xml index 843f160..1833b35 100644 --- a/external/kafka-0-10-sql/pom.xml +++ b/external/kafka-0-10-sql/pom.xml @@ -80,6 +80,10 @@ <version>${kafka.version}</version> </dependency> <dependency> + <groupId>com.google.code.findbugs</groupId> + <artifactId>jsr305</artifactId> + </dependency> + <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-pool2</artifactId> <version>${commons-pool2.version}</version> diff --git a/external/kafka-0-10-token-provider/pom.xml b/external/kafka-0-10-token-provider/pom.xml index dbe2ab9..4ee09fa 100644 --- a/external/kafka-0-10-token-provider/pom.xml +++ b/external/kafka-0-10-token-provider/pom.xml @@ -59,6 +59,11 @@ <scope>test</scope> </dependency> <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>${hadoop-client-runtime.artifact}</artifactId> + <scope>${hadoop.deps.scope}</scope> + </dependency> + <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-tags_${scala.binary.version}</artifactId> </dependency> diff --git a/external/kinesis-asl-assembly/pom.xml b/external/kinesis-asl-assembly/pom.xml index 22259b0..9a98d7c 100644 --- a/external/kinesis-asl-assembly/pom.xml +++ b/external/kinesis-asl-assembly/pom.xml @@ -91,10 +91,16 @@ </dependency> <dependency> <groupId>org.apache.hadoop</groupId> - <artifactId>hadoop-client</artifactId> + <artifactId>${hadoop-client-api.artifact}</artifactId> + <version>${hadoop.version}</version> <scope>provided</scope> </dependency> <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>${hadoop-client-runtime.artifact}</artifactId> + <version>${hadoop.version}</version> + </dependency> + <dependency> <groupId>org.apache.avro</groupId> <artifactId>avro-ipc</artifactId> <scope>provided</scope> diff --git a/hadoop-cloud/pom.xml b/hadoop-cloud/pom.xml index 03910ba..c0997e5 100644 --- a/hadoop-cloud/pom.xml +++ b/hadoop-cloud/pom.xml @@ -58,10 +58,15 @@ </dependency> <dependency> <groupId>org.apache.hadoop</groupId> - <artifactId>hadoop-client</artifactId> + <artifactId>${hadoop-client-api.artifact}</artifactId> <version>${hadoop.version}</version> <scope>provided</scope> </dependency> + <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>${hadoop-client-runtime.artifact}</artifactId> + <version>${hadoop.version}</version> + </dependency> <!-- the AWS module pulls in jackson; its transitive dependencies can create intra-jackson-module version problems. diff --git a/launcher/pom.xml b/launcher/pom.xml index 5da2a49..dd872f4 100644 --- a/launcher/pom.xml +++ b/launcher/pom.xml @@ -81,7 +81,14 @@ <!-- Not needed by the test code, but referenced by SparkSubmit which is used by the tests. --> <dependency> <groupId>org.apache.hadoop</groupId> - <artifactId>hadoop-client</artifactId> + <artifactId>${hadoop-client-api.artifact}</artifactId> + <version>${hadoop.version}</version> + <scope>test</scope> + </dependency> + <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>${hadoop-client-runtime.artifact}</artifactId> + <version>${hadoop.version}</version> <scope>test</scope> </dependency> </dependencies> diff --git a/pom.xml b/pom.xml index f921e35..26b5186 100644 --- a/pom.xml +++ b/pom.xml @@ -120,7 +120,7 @@ <sbt.project.name>spark</sbt.project.name> <slf4j.version>1.7.30</slf4j.version> <log4j.version>1.2.17</log4j.version> - <hadoop.version>3.2.0</hadoop.version> + <hadoop.version>3.2.2</hadoop.version> <protobuf.version>2.5.0</protobuf.version> <yarn.version>${hadoop.version}</yarn.version> <zookeeper.version>3.4.14</zookeeper.version> @@ -246,6 +246,15 @@ <parquet.test.deps.scope>test</parquet.test.deps.scope> <!-- + These default to Hadoop 3.x shaded client/minicluster jars, but are switched to hadoop-client + when the Hadoop profile is hadoop-2.7, because these are only available in 3.x. Note that, + as result we have to include the same hadoop-client dependency multiple times in hadoop-2.7. + --> + <hadoop-client-api.artifact>hadoop-client-api</hadoop-client-api.artifact> + <hadoop-client-runtime.artifact>hadoop-client-runtime</hadoop-client-runtime.artifact> + <hadoop-client-minicluster.artifact>hadoop-client-minicluster</hadoop-client-minicluster.artifact> + + <!-- Overridable test home. So that you can call individual pom files directly without things breaking. --> @@ -860,6 +869,11 @@ <version>2.0.1</version> </dependency> <dependency> + <groupId>javax.xml.bind</groupId> + <artifactId>jaxb-api</artifactId> + <version>2.2.11</version> + </dependency> + <dependency> <groupId>org.scalanlp</groupId> <artifactId>breeze_${scala.binary.version}</artifactId> <version>1.0</version> @@ -1067,6 +1081,26 @@ <version>${curator.version}</version> <scope>test</scope> </dependency> + <!-- Hadoop 3.x dependencies --> + <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-client-api</artifactId> + <version>${hadoop.version}</version> + <scope>${hadoop.deps.scope}</scope> + </dependency> + <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-client-runtime</artifactId> + <version>${hadoop.version}</version> + <scope>${hadoop.deps.scope}</scope> + </dependency> + <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-client-minicluster</artifactId> + <version>${yarn.version}</version> + <scope>test</scope> + </dependency> + <!-- End of Hadoop 3.x dependencies --> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> @@ -1657,6 +1691,14 @@ <artifactId>ant</artifactId> </exclusion> <exclusion> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-common</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-auth</artifactId> + </exclusion> + <exclusion> <groupId>org.apache.zookeeper</groupId> <artifactId>zookeeper</artifactId> </exclusion> @@ -2420,17 +2462,6 @@ </rules> </configuration> </execution> - <execution> - <id>enforce-no-duplicate-dependencies</id> - <goals> - <goal>enforce</goal> - </goals> - <configuration> - <rules> - <banDuplicatePomDependencyVersions/> - </rules> - </configuration> - </execution> </executions> </plugin> <plugin> @@ -2901,6 +2932,7 @@ <artifactId>maven-shade-plugin</artifactId> <configuration> <shadedArtifactAttached>false</shadedArtifactAttached> + <createDependencyReducedPom>false</createDependencyReducedPom> <artifactSet> <includes> <include>org.spark-project.spark:unused</include> @@ -3162,6 +3194,9 @@ <hadoop.version>2.7.4</hadoop.version> <curator.version>2.7.1</curator.version> <commons-io.version>2.4</commons-io.version> + <hadoop-client-api.artifact>hadoop-client</hadoop-client-api.artifact> + <hadoop-client-runtime.artifact>hadoop-client</hadoop-client-runtime.artifact> + <hadoop-client-minicluster.artifact>hadoop-client</hadoop-client-minicluster.artifact> </properties> </profile> diff --git a/resource-managers/kubernetes/core/pom.xml b/resource-managers/kubernetes/core/pom.xml index 44df4e1..3fff940 100644 --- a/resource-managers/kubernetes/core/pom.xml +++ b/resource-managers/kubernetes/core/pom.xml @@ -64,10 +64,6 @@ <artifactId>*</artifactId> </exclusion> <exclusion> - <groupId>com.fasterxml.jackson.module</groupId> - <artifactId>jackson-module-jaxb-annotations</artifactId> - </exclusion> - <exclusion> <groupId>com.fasterxml.jackson.dataformat</groupId> <artifactId>jackson-dataformat-yaml</artifactId> </exclusion> @@ -85,11 +81,6 @@ <artifactId>jackson-dataformat-yaml</artifactId> <version>${fasterxml.jackson.version}</version> </dependency> - <dependency> - <groupId>com.fasterxml.jackson.module</groupId> - <artifactId>jackson-module-jaxb-annotations</artifactId> - <version>${fasterxml.jackson.version}</version> - </dependency> <!-- Explicitly depend on shaded dependencies from the parent, since shaded deps aren't transitive --> <dependency> diff --git a/resource-managers/yarn/pom.xml b/resource-managers/yarn/pom.xml index c0ce1c8..a662953 100644 --- a/resource-managers/yarn/pom.xml +++ b/resource-managers/yarn/pom.xml @@ -40,6 +40,42 @@ <spark.yarn.isHadoopProvided>true</spark.yarn.isHadoopProvided> </properties> </profile> + <profile> + <id>hadoop-2.7</id> + <dependencies> + <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-yarn-api</artifactId> + </dependency> + <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-yarn-common</artifactId> + </dependency> + <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-yarn-server-web-proxy</artifactId> + </dependency> + <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-yarn-client</artifactId> + </dependency> + <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-yarn-server-tests</artifactId> + <classifier>tests</classifier> + <scope>test</scope> + </dependency> + <!-- + Hack to exclude org.apache.hadoop:hadoop-yarn-server-resourcemanager:jar:tests. + See the parent pom.xml for more details. + --> + <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-yarn-server-resourcemanager</artifactId> + <scope>test</scope> + </dependency> + </dependencies> + </profile> </profiles> <dependencies> @@ -69,23 +105,20 @@ </dependency> <dependency> <groupId>org.apache.hadoop</groupId> - <artifactId>hadoop-yarn-api</artifactId> - </dependency> - <dependency> - <groupId>org.apache.hadoop</groupId> - <artifactId>hadoop-yarn-common</artifactId> + <artifactId>${hadoop-client-api.artifact}</artifactId> + <version>${hadoop.version}</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> - <artifactId>hadoop-yarn-server-web-proxy</artifactId> + <artifactId>${hadoop-client-runtime.artifact}</artifactId> + <version>${hadoop.version}</version> + <scope>${hadoop.deps.scope}</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> - <artifactId>hadoop-yarn-client</artifactId> - </dependency> - <dependency> - <groupId>org.apache.hadoop</groupId> - <artifactId>hadoop-client</artifactId> + <artifactId>${hadoop-client-minicluster.artifact}</artifactId> + <version>${hadoop.version}</version> + <scope>test</scope> </dependency> <!-- Explicit listing of transitive deps that are shaded. Otherwise, odd compiler crashes. --> @@ -136,18 +169,6 @@ </dependency> <dependency> - <groupId>org.apache.hadoop</groupId> - <artifactId>hadoop-yarn-server-tests</artifactId> - <classifier>tests</classifier> - <scope>test</scope> - </dependency> - <dependency> - <groupId>org.apache.hadoop</groupId> - <artifactId>hadoop-yarn-server-resourcemanager</artifactId> - <scope>test</scope> - </dependency> - - <dependency> <groupId>org.mockito</groupId> <artifactId>mockito-core</artifactId> <scope>test</scope> diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala index ab69507..eb927a3 100644 --- a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala +++ b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala @@ -19,7 +19,7 @@ package org.apache.spark.deploy.yarn import java.io.{File, IOException} import java.lang.reflect.{InvocationTargetException, Modifier} -import java.net.{URI, URL} +import java.net.{URI, URL, URLEncoder} import java.security.PrivilegedExceptionAction import java.util.concurrent.{TimeoutException, TimeUnit} @@ -36,7 +36,6 @@ import org.apache.hadoop.yarn.api._ import org.apache.hadoop.yarn.api.records._ import org.apache.hadoop.yarn.conf.YarnConfiguration import org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException -import org.apache.hadoop.yarn.server.webproxy.ProxyUriUtils import org.apache.hadoop.yarn.util.{ConverterUtils, Records} import org.apache.spark._ @@ -308,7 +307,8 @@ private[spark] class ApplicationMaster( // The client-mode AM doesn't listen for incoming connections, so report an invalid port. registerAM(Utils.localHostName, -1, sparkConf, sparkConf.getOption("spark.driver.appUIAddress"), appAttemptId) - addAmIpFilter(Some(driverRef), ProxyUriUtils.getPath(appAttemptId.getApplicationId)) + val encodedAppId = URLEncoder.encode(appAttemptId.getApplicationId.toString, "UTF-8") + addAmIpFilter(Some(driverRef), s"/proxy/$encodedAppId") createAllocator(driverRef, sparkConf, clientRpcEnv, appAttemptId, cachedResourcesConf) reporterThread.join() } catch { diff --git a/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala b/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala index 20f5339..a813b99 100644 --- a/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala +++ b/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala @@ -80,6 +80,16 @@ abstract class BaseYarnClusterSuite yarnConf.set("yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage", "100.0") + // capacity-scheduler.xml is missing in hadoop-client-minicluster so this is a workaround + yarnConf.set("yarn.scheduler.capacity.root.queues", "default") + yarnConf.setInt("yarn.scheduler.capacity.root.default.capacity", 100) + yarnConf.setFloat("yarn.scheduler.capacity.root.default.user-limit-factor", 1) + yarnConf.setInt("yarn.scheduler.capacity.root.default.maximum-capacity", 100) + yarnConf.set("yarn.scheduler.capacity.root.default.state", "RUNNING") + yarnConf.set("yarn.scheduler.capacity.root.default.acl_submit_applications", "*") + yarnConf.set("yarn.scheduler.capacity.root.default.acl_administer_queue", "*") + yarnConf.setInt("yarn.scheduler.capacity.node-locality-delay", -1) + yarnCluster = new MiniYARNCluster(getClass().getName(), 1, 1, 1) yarnCluster.init(yarnConf) yarnCluster.start() diff --git a/sql/catalyst/pom.xml b/sql/catalyst/pom.xml index 0553438..583738b 100644 --- a/sql/catalyst/pom.xml +++ b/sql/catalyst/pom.xml @@ -105,6 +105,10 @@ <artifactId>antlr4-runtime</artifactId> </dependency> <dependency> + <groupId>javax.xml.bind</groupId> + <artifactId>jaxb-api</artifactId> + </dependency> + <dependency> <groupId>commons-codec</groupId> <artifactId>commons-codec</artifactId> </dependency> diff --git a/sql/hive/pom.xml b/sql/hive/pom.xml index 27d2756..74b1f9d 100644 --- a/sql/hive/pom.xml +++ b/sql/hive/pom.xml @@ -163,6 +163,11 @@ <artifactId>datanucleus-core</artifactId> </dependency> <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>${hadoop-client-runtime.artifact}</artifactId> + <scope>${hadoop.deps.scope}</scope> + </dependency> + <dependency> <groupId>org.apache.thrift</groupId> <artifactId>libthrift</artifactId> </dependency> diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala index 02bf865..4e5e58d 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala @@ -112,11 +112,24 @@ private[hive] object IsolatedClientLoader extends Logging { hadoopVersion: String, ivyPath: Option[String], remoteRepos: String): Seq[URL] = { + val hadoopJarNames = if (hadoopVersion.startsWith("3")) { + Seq(s"org.apache.hadoop:hadoop-client-api:$hadoopVersion", + s"org.apache.hadoop:hadoop-client-runtime:$hadoopVersion") + } else { + Seq(s"org.apache.hadoop:hadoop-client:$hadoopVersion") + } val hiveArtifacts = version.extraDeps ++ Seq("hive-metastore", "hive-exec", "hive-common", "hive-serde") .map(a => s"org.apache.hive:$a:${version.fullVersion}") ++ - Seq("com.google.guava:guava:14.0.1", - s"org.apache.hadoop:hadoop-client:$hadoopVersion") + Seq("com.google.guava:guava:14.0.1") ++ hadoopJarNames + + val extraExclusions = if (hadoopVersion.startsWith("3")) { + // this introduced from lower version of Hive could conflict with jars in Hadoop 3.2+, so + // exclude here in favor of the ones in Hadoop 3.2+ + Seq("org.apache.hadoop:hadoop-auth") + } else { + Seq.empty + } val classpaths = quietly { SparkSubmitUtils.resolveMavenCoordinates( @@ -125,7 +138,7 @@ private[hive] object IsolatedClientLoader extends Logging { Some(remoteRepos), ivyPath), transitive = true, - exclusions = version.exclusions) + exclusions = version.exclusions ++ extraExclusions) } val allFiles = classpaths.map(new File(_)).toSet --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org