Can I note that if Spark 2.0 is going to be Java 8+ only, then that means 
Hadoop 2.6.x should be the minimum Hadoop version.

https://issues.apache.org/jira/browse/HADOOP-11090

Where things get complicated, is that situation of: Hadoop services on Java 7, 
Spark on Java 8 in its own JVM

I'm not sure that you could get away with having the newer version of the 
Hadoop classes in the spark assembly/lib dir, without coming up against 
incompatibilities with the Hadoop JNI libraries. These are currently backwards 
compatible, but trying to link up Hadoop 2.7 against a Hadoop 2.6 hadoop lib 
will generate an UnsatisfiedLinkException. Meaning: the whole cluster's hadoop 
libs have to be in sync, or at least the main cluster release in a version of 
hadoop 2.x >= the spark bundled edition.

Ignoring that detail,

Hadoop 2.6.1+
Guava >= 15? 17?

 I think the outcome of Hadoop < 2.6 and JDK >= 8 is "undefined"; all bug 
reports will be met with a "please upgrade, re-open if the problem is still 
there".

Kerberos is  a particular troublespot here : You need Hadoop 2.6.1+ for 
Kerberos to work in Java 8 and recent versions of Java 7 (HADOOP-10786)

Note also that HADOOP-11628 is in 2.8 only. SPNEGO + CNAMES. I'll see about 
pulling that into 2.7.x, though I'm reluctant to go near 2.6 just to keep that 
extra stable.


Thomas: you've got the big clusters, what versions of Hadoop will they be on by 
the time you look at Spark 2.0?

-Steve




Reply via email to