If 3.x is going to be Java 8 & not backwards compatible, I don't expect anyone
wanting to use this in production until some time deep into 2016.
Issue: JDK 8 vs 7
It will require Hadoop clusters to move up to Java 8. While there's dev pull
for this, there's ops pull against this: people are still in the moving-off
Java 6 phase due to that "it's working, don't update it" philosophy. Java 8 is
compelling to us coders, but that doesn't mean ops want it.
You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the
main thing is setting up JAVA_HOME. That's something we could make easier
somehow (maybe some min Java version field in resource requests that will let
apps say java 8, java 9, ...). YARN could not only set up JVM paths, it could
fail-fast if a Java version wasn't available.
What we can't do in hadoop coretoday is set javac.version=1.8 & use java 8
code. Downstream code ca do that (Hive, etc); they just need to accept that
they don't get to play on JDK7 clusters if they embrace l-expressions.
So...we need to stay on java 7 for some time due to ops pull; downstream apps
get to choose what they want. We can/could enhance YARN to make JVM choice more
declarative.
Issue: Incompatible changes
Without knowing what is proposed for "an incompatible classpath change", I
can't say whether this is something that could be made optional. If it isn't,
then it is a python-3 class option, "rewrite your code" event, which is going
to be particularly traumatic to things like Hive that already do complex CP
games. I'm currently against any mandatory change here, though would love to
see an optional one. And if optional, it ceases to become an incompatible
change...
Issue: Getting trunk out the door
The main diff from branch-2 and trunk is currently the bash script changes.
These don't break client apps. May or may not break bigtop & other downstream
hadoop stacks, but developers don't need to worry about this: no recompilation
necessary
Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
It seems to me that I could go
git checkout trunk
mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
We'd then have a version of Hadoop-trunk we could ship later this year,
compatible at the JDK and API level with the existing java code & JDK7+
clusters.
A classpath fix that is optional/compatible can then go out on the 2.x line,
saving the 3.x tag for something that really breaks things, forces all
downstream apps to set up new hadoop profiles, have separate modules &
generally hate the hadoop dev team
This lets us tick off the "recent trunk release" and "fixed shell scripts"
items, pushing out those benefits to people sooner rather than later, and puts
off the "Hello, we've just broken your code" event for another 12+ months.
Comments?
-Steve