[FLINK-7973] Fix shading and relocating Hadoop for the S3 filesystems - do not shade everything, especially not JDK classes! -> instead define include patterns explicitly - do not shade core Flink classes (only those imported from flink-hadoop-fs) - hack around Hadoop loading (unshaded/non-relocated) classes based on names in the core-default.xml by overwriting the Configuration class (we may need to extend this for the mapred-default.xml and hdfs-defaults.xml): -> provide a core-default-shaded.xml file with shaded class names and copy and adapt the Configuration class of the respective Hadoop version to load this file instead of core-default.xml.
Add checkstyle suppression pattern for the Hadoop Configuration classes Also fix the (integration) tests not working because they tried to load the relocated classes which are apparently not available there Remove minimizeJar from shading of flink-s3-fs-presto because this was causing "java.lang.ClassNotFoundException: org.apache.flink.fs.s3presto.shaded.org.apache.commons.logging.impl.LogFactoryImpl" since these classes are not statically imported and thus removed when minimizing. Fix s3-fs-presto not shading org.HdrHistogram Fix log4j being relocated in the S3 fs implementations Add shading checks to travis Project: http://git-wip-us.apache.org/repos/asf/flink/repo Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/25a28ab3 Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/25a28ab3 Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/25a28ab3 Branch: refs/heads/release-1.4 Commit: 25a28ab32609c45fb8c40f717148e32fb453d2fc Parents: ce1cb8f Author: Nico Kruber <[email protected]> Authored: Mon Nov 6 19:53:37 2017 +0100 Committer: Aljoscha Krettek <[email protected]> Committed: Mon Nov 13 17:41:15 2017 +0100 ---------------------------------------------------------------------- flink-filesystems/flink-s3-fs-hadoop/README.md | 27 + flink-filesystems/flink-s3-fs-hadoop/pom.xml | 84 +- .../org/apache/hadoop/conf/Configuration.java | 3002 ++++++++++++++++++ .../src/main/resources/core-default-shaded.xml | 2312 ++++++++++++++ .../src/test/resources/core-site.xml | 2312 ++++++++++++++ flink-filesystems/flink-s3-fs-presto/README.md | 28 + flink-filesystems/flink-s3-fs-presto/pom.xml | 73 +- .../org/apache/hadoop/conf/Configuration.java | 2951 +++++++++++++++++ .../src/main/resources/core-default-shaded.xml | 1978 ++++++++++++ .../src/test/resources/core-site.xml | 1978 ++++++++++++ tools/maven/suppressions.xml | 4 + tools/travis_mvn_watchdog.sh | 53 +- 12 files changed, 14778 insertions(+), 24 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/flink/blob/25a28ab3/flink-filesystems/flink-s3-fs-hadoop/README.md ---------------------------------------------------------------------- diff --git a/flink-filesystems/flink-s3-fs-hadoop/README.md b/flink-filesystems/flink-s3-fs-hadoop/README.md new file mode 100644 index 0000000..3ad90e3 --- /dev/null +++ b/flink-filesystems/flink-s3-fs-hadoop/README.md @@ -0,0 +1,27 @@ +This project is a wrapper around Hadoop's s3a file system. By pulling a smaller dependency tree and +shading all dependencies away, this keeps the appearance of Flink being Hadoop-free, +from a dependency perspective. + +We also relocate the shaded Hadoop version to allow running in a different +setup. For this to work, however, we needed to adapt Hadoop's `Configuration` +class to load a (shaded) `core-default-shaded.xml` configuration with the +relocated class names of classes loaded via reflection +(in the future, we may need to extend this to `mapred-default.xml` and `hdfs-defaults.xml` and their respective configuration classes). + +# Changing the Hadoop Version + +If you want to change the Hadoop version this project depends on, the following +steps are required to keep the shading correct: + +1. copy `org/apache/hadoop/conf/Configuration.java` from the respective Hadoop jar file to this project + - adapt the `Configuration` class by replacing `core-default.xml` with `core-default-shaded.xml`. +2. copy `core-default.xml` from the respective Hadoop jar file to this project as + - `src/main/resources/core-default-shaded.xml` (replacing every occurence of `org.apache.hadoop` with `org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop`) + - `src/test/resources/core-site.xml` (as is) +3. verify the shaded jar: + - does not contain any unshaded classes except for `org.apache.flink.fs.s3hadoop.S3FileSystemFactory` + - all other classes should be under `org.apache.flink.fs.s3hadoop.shaded` + - there should be a `META-INF/services/org.apache.flink.fs.s3hadoop.S3FileSystemFactory` file pointing to the `org.apache.flink.fs.s3hadoop.S3FileSystemFactory` class + - other service files under `META-INF/services` should have their names and contents in the relocated `org.apache.flink.fs.s3hadoop.shaded` package + - contains a `core-default-shaded.xml` file + - does not contain a `core-default.xml` or `core-site.xml` file http://git-wip-us.apache.org/repos/asf/flink/blob/25a28ab3/flink-filesystems/flink-s3-fs-hadoop/pom.xml ---------------------------------------------------------------------- diff --git a/flink-filesystems/flink-s3-fs-hadoop/pom.xml b/flink-filesystems/flink-s3-fs-hadoop/pom.xml index de7921d..7ffe821 100644 --- a/flink-filesystems/flink-s3-fs-hadoop/pom.xml +++ b/flink-filesystems/flink-s3-fs-hadoop/pom.xml @@ -33,6 +33,7 @@ under the License. <packaging>jar</packaging> <properties> + <!-- Do not change this without updating the copied Configuration class! --> <s3hadoop.hadoop.version>2.8.1</s3hadoop.hadoop.version> <s3hadoop.aws.version>1.11.95</s3hadoop.aws.version> </properties> @@ -234,28 +235,87 @@ under the License. </artifactSet> <relocations> <relocation> - <pattern>org</pattern> - <shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org</shadedPattern> + <pattern>com.amazonaws</pattern> + <shadedPattern>org.apache.flink.fs.s3hadoop.shaded.com.amazonaws</shadedPattern> + </relocation> + <relocation> + <pattern>com.fasterxml</pattern> + <shadedPattern>org.apache.flink.fs.s3hadoop.shaded.com.fasterxml</shadedPattern> + </relocation> + <relocation> + <pattern>com.google</pattern> + <shadedPattern>org.apache.flink.fs.s3hadoop.shaded.com.google</shadedPattern> + <excludes> + <!-- provided --> + <exclude>com.google.code.findbugs.**</exclude> + </excludes> + </relocation> + <relocation> + <pattern>com.nimbusds</pattern> + <shadedPattern>org.apache.flink.fs.s3hadoop.shaded.com.nimbusds</shadedPattern> + </relocation> + <relocation> + <pattern>com.squareup</pattern> + <shadedPattern>org.apache.flink.fs.s3hadoop.shaded.com.squareup</shadedPattern> + </relocation> + <relocation> + <pattern>net.jcip</pattern> + <shadedPattern>org.apache.flink.fs.s3hadoop.shaded.net.jcip</shadedPattern> + </relocation> + <relocation> + <pattern>net.minidev</pattern> + <shadedPattern>org.apache.flink.fs.s3hadoop.shaded.net.minidev</shadedPattern> + </relocation> + + <!-- relocate everything from the flink-hadoop-fs project --> + <relocation> + <pattern>org.apache.flink.runtime.fs.hdfs</pattern> + <shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.fs.hdfs</shadedPattern> + </relocation> + <relocation> + <pattern>org.apache.flink.runtime.util</pattern> + <shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.util</shadedPattern> + <includes> + <include>org.apache.flink.runtime.util.**Hadoop*</include> + </includes> + </relocation> + + <relocation> + <pattern>org.apache</pattern> + <shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org.apache</shadedPattern> <excludes> - <exclude>org.apache.flink.core.fs.FileSystemFactory</exclude> - <exclude>org.apache.flink.fs.s3hadoop.**</exclude> + <!-- keep all other classes of flink as they are (exceptions above) --> + <exclude>org.apache.flink.**</exclude> + <exclude>org.apache.log4j.**</exclude> <!-- provided --> </excludes> </relocation> <relocation> - <pattern>com</pattern> - <shadedPattern>org.apache.flink.fs.s3hadoop.shaded.com</shadedPattern> + <pattern>org.codehaus</pattern> + <shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org.codehaus</shadedPattern> + </relocation> + <relocation> + <pattern>org.joda</pattern> + <shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org.joda</shadedPattern> + </relocation> + <relocation> + <pattern>org.mortbay</pattern> + <shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org.mortbay</shadedPattern> + </relocation> + <relocation> + <pattern>org.tukaani</pattern> + <shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org.tukaani</shadedPattern> </relocation> <relocation> - <pattern>net</pattern> - <shadedPattern>org.apache.flink.fs.s3hadoop.shaded.net</shadedPattern> + <pattern>org.znerd</pattern> + <shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org.znerd</shadedPattern> </relocation> <relocation> <pattern>okio</pattern> <shadedPattern>org.apache.flink.fs.s3hadoop.shaded.okio</shadedPattern> </relocation> <relocation> - <pattern>software</pattern> - <shadedPattern>org.apache.flink.fs.s3hadoop.shaded.software</shadedPattern> + <pattern>software.amazon</pattern> + <shadedPattern>org.apache.flink.fs.s3hadoop.shaded.software.amazon</shadedPattern> </relocation> </relocations> <filters> @@ -277,6 +337,10 @@ under the License. <exclude>META-INF/maven/org.apache.commons/**</exclude> <exclude>META-INF/maven/org.apache.flink/flink-hadoop-fs/**</exclude> <exclude>META-INF/maven/org.apache.flink/force-shading/**</exclude> + <!-- we use our own "shaded" core-default.xml: core-default-shaded.xml --> + <exclude>core-default.xml</exclude> + <!-- we only add a core-site.xml with unshaded classnames for the unit tests --> + <exclude>core-site.xml</exclude> </excludes> </filter> </filters>
