StephanEwen commented on issue #6663: [FLINK-10209][build] Exclude jdk.tools dependency from hadoop URL: https://github.com/apache/flink/pull/6663#issuecomment-424054104 This context may make my line of thinking easier to understand: The hadoop-shaded module is a convenience artifact, one where we discussed previously to phase it out eventually. It is used (1) to compile against (for HDFS / YARN / Kerberos code) and (2) to add as a jar to the lib folder. - Concerning a general exclusion tor the jdk.tools dependency: Since we don't compile Hadoop itself and we don't redistribute that dependency (it is a system dependency) I cannot see how a general exclusion would be a problem. It simplifies the build files, which is something really good. - We should encourage use of HADOOP_CLASSPATH rather than use of our Hadoop fat jar anyways. That reduces the value of the second use of the hadoop-shaded project, the packaging into the dist lib folder. If we purely go for the HADOOP_CLASSPATH variant, we could remove that project all together and simply have a provided or optional Hadoop dependency. - The fat hadoop jar is used for client side functionality only, and since version 2, Hadoop claims to have a stable setup (HDFS protocol, Kerberos config, etc.) , so we don't need each major/minor version, but one of every major version should work. We should not need the vendor specific versions either. And, there is still the HADOOP_CLASSPATH workaround in case any of the vendor-specific versions has a compatibility problem after all. - Concerning moving Hadoop to flink-shaded: We don't have to find a setup that converges across Hadoop versions, that is exactly the point. We pick some Hadoop versions for which we want to build convenience jars and converge these manually or by shading.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
