I took the liberty of adding line breaks to Joep's mail. Thanks for the great feedback Joep. The goal with 3.x is to maintain API and wire compatibility with 2.x, which I think addresses most of your concerns. A 2.x client running on JDK7 would then still be able to talk to a 3.x server running on JDK8. Classpath isolation is also proposed as a banner feature, which directly addresses g). This might require new (major?) releases for some downstreams, but the feedback I've heard related to this has been very positive.
Best, Andrew ============== It depends on the "Return on Pain". While it is hard to quantify the returns in the abstract, I can try to sketch out which kinds of changes are the most painful and therefore cause the most friction for us.In rough order of increasing pain to deal with: a) There is a new upstream (3.x) release, but it is so backwards incompatible, that we won't be able to adopt it for the foreseeable future. Even though we don’t adopt it, it still causes pain. Now development becomes that much harder because we'd have to get a patch for trunk, a patch for 3.x and a patch for the 2.x branch. Conversely if patches go into 2.x only, now the releases start drifting apart. We already have (several dozen) patches in production that have not yet made it upstream, but are striving to keep this list as short as possible to reduce the rebase pain and risk. b) Central Daemons (RM, or pairs of HA NNs) have to be restarted causing a cluster-wide outage. The work towards work-preserving restart in progress in various areas makes these kinds of upgrades less painful. c) Server-side requires different runtime from client-side. We'd have to produce multiple artifacts, but we could make that work. For example, NN code uses Java 8 features, but clients can still use Java 7 to submit jobs and read/write HDFS. Now for the more painful backwards incompatibilities: d) All clients have to recompile (a token uses protobuf instead of thrift, an interface becomes an abstract class or vice versa). Not only do these kinds of changes make a rolling upgrade impossible, more importantly it requires all our clients to recompile their code and redeploy their production pipelines in a coordinated fashion. On top of this, we have multiple large production clusters and clients would have to keep multiple incompatible pipelines running, because we simply cannot upgrade all clusters in all datacenters at the same time. e) Customers are forced to restart and can no longer run with JDK 7 clients because job submission client code or HDFS has started using JDK 8-only features. Eventually group will reduce, but for at least another year if not more this will be very painful. f) Even more painful is when Yarn/MapReduce APIs change so that customers not only have to recompile, but also have to change hundreds of scripts / flows in order to deal with the API change. This problem is compounded by other tools in the Hadoop ecosystem that would have to deal with these changes. There would be two different versions of Cascading, HBase, Hive, Pig, Spark, Tez, you name it. g) Without proper classpath isolation, third party dependency changes (guava, protobuf version, etc) are probably as painful as API changes. h) HDFS client API get changed in a backwards incompatible way requiring all clients to change their code, recompile and re-start their services in a coordinated way. We have tens of thousands of production servers reading from / writing to Hadoop and cannot have all of these long running clients restart at the same time. To put these in perspective, despite us being one of the early adopters of Hadoop 2 in production at the scale of many thousands of nodes, we are still wrapping up the migration from our last Hadoop 1 clusters. We have many war stories about many of the above incompatibilities. As I've tweeted about publicly the gains have been significant with this migration to Hadoop 2, but the friction has also been considerable. To get specific about JDK 8, we are intending to move to Java 8. Right now we're letting clients choose to run tasks with JDK 8 optionally, then we'll make it default. We'll switch to the daemons running with JDK 8. What we're concerned it would then be feasible to use JDK 8 features on the servers side (see c) above). I'm suggesting that if we do allow backwards incompatible changes, we introduce an upgrade path through an agreed upon stepping stone release. For example, a protocol changing from thrift to protobuf can be done in steps. In the stepping-stone release both would be accepted. in the following release (or two releases later) the thrift version support is dropped.This would allow for a rolling upgrade, or even if a cluster-wide restart is needed, at least customers can adopt to the change at a pace of weeks or months. Once no more (important) customers are running the thrift client, we could then roll to the next release. It would be useful to coordinate the backwards incompatibilities so that not every release becomes a stepping-stone release.