On 3 Apr 2018, at 01:30, Saisai Shao 
<sai.sai.s...@gmail.com<mailto:sai.sai.s...@gmail.com>> wrote:

Yes, the main blocking issue is the hive version used in Spark (1.2.1.spark) 
doesn't support run on Hadoop 3. Hive will check the Hadoop version in the 
runtime [1]. Besides this I think some pom changes should be enough to support 
Hadoop 3.

If we want to use Hadoop 3 shaded client jar, then the pom requires lots of 
changes, but this is not necessary.


[1] 
https://github.com/apache/hive/blob/6751225a5cde4c40839df8b46e8d241fdda5cd34/shims/common/src/main/java/org/apache/hadoop/hive/shims/ShimLoader.java#L144


I don't think the hadoop-shaded JAR is complete enough for spark yet...it was 
very much driven by HBase's needs. But there's only one way to get Hadoop to 
fix that: try the move, find the problems, complain noisily. Then Hadoop 3.2 
and/or a 3.1.x for x>=1 can have the broader shading

Assume my name is next to the "Shade hadoop-cloud-storage" problem, though 
there the fact that aws-java-sdk-bundle is 50 MB already, I don't plan to shade 
that at all. The AWS shading already isolates everything from amazon's choice 
of Jackson, which was one of the sore points.

-Steve

Reply via email to