On 3 Apr 2018, at 01:30, Saisai Shao <sai.sai.s...@gmail.com<mailto:sai.sai.s...@gmail.com>> wrote:
Yes, the main blocking issue is the hive version used in Spark (1.2.1.spark) doesn't support run on Hadoop 3. Hive will check the Hadoop version in the runtime [1]. Besides this I think some pom changes should be enough to support Hadoop 3. If we want to use Hadoop 3 shaded client jar, then the pom requires lots of changes, but this is not necessary. [1] https://github.com/apache/hive/blob/6751225a5cde4c40839df8b46e8d241fdda5cd34/shims/common/src/main/java/org/apache/hadoop/hive/shims/ShimLoader.java#L144 I don't think the hadoop-shaded JAR is complete enough for spark yet...it was very much driven by HBase's needs. But there's only one way to get Hadoop to fix that: try the move, find the problems, complain noisily. Then Hadoop 3.2 and/or a 3.1.x for x>=1 can have the broader shading Assume my name is next to the "Shade hadoop-cloud-storage" problem, though there the fact that aws-java-sdk-bundle is 50 MB already, I don't plan to shade that at all. The AWS shading already isolates everything from amazon's choice of Jackson, which was one of the sore points. -Steve