Thanks, Andrew and Joep. +1 for maintaining wire and API compatibility, but moving to JDK8 in 3.0
best, Colin On Mon, Mar 16, 2015 at 3:22 PM, Andrew Wang <andrew.w...@cloudera.com> wrote: > I took the liberty of adding line breaks to Joep's mail. > > Thanks for the great feedback Joep. The goal with 3.x is to maintain API > and wire compatibility with 2.x, which I think addresses most of your > concerns. A 2.x client running on JDK7 would then still be able to talk to > a 3.x server running on JDK8. Classpath isolation is also proposed as a > banner feature, which directly addresses g). This might require new > (major?) releases for some downstreams, but the feedback I've heard related > to this has been very positive. > > Best, > Andrew > > ============== > > It depends on the "Return on Pain". While it is hard to quantify the > returns in the abstract, I can try to sketch out which kinds of changes are > the most painful and therefore cause the most friction for us.In rough > order of increasing pain to deal with: > > a) There is a new upstream (3.x) > release, but it is so backwards incompatible, that we won't be able to > adopt it for the foreseeable future. Even though we don’t adopt it, it > still causes pain. Now development becomes that much harder because we'd > have to get a patch for trunk, a patch for 3.x and a patch for the 2.x > branch. Conversely if patches go into 2.x only, now the releases start > drifting apart. We already have (several dozen) patches in production that > have not yet made it upstream, but are striving to keep this list as short > as possible to reduce the rebase pain and risk. > > b) Central Daemons (RM, or > pairs of HA NNs) have to be restarted causing a cluster-wide outage. The > work towards work-preserving restart in progress in various areas makes > these kinds of upgrades less painful. > > c) Server-side requires different > runtime from client-side. We'd have to produce multiple artifacts, but we > could make that work. For example, NN code uses Java 8 features, but > clients can still use Java 7 to submit jobs and read/write HDFS. > > Now for the more painful backwards incompatibilities: > > d) All clients have to recompile > (a token uses protobuf instead of thrift, an interface becomes an abstract > class or vice versa). Not only do these kinds of changes make a rolling > upgrade impossible, more importantly it requires all our clients to > recompile their code and redeploy their production pipelines in a > coordinated fashion. On top of this, we have multiple large production > clusters and clients would have to keep multiple incompatible pipelines > running, because we simply cannot upgrade all clusters in all datacenters > at the same time. > > e) Customers are forced to restart and can no longer run > with JDK 7 clients because job submission client code or HDFS has started > using JDK 8-only features. Eventually group will reduce, but for at least > another year if not more this will be very painful. > > f) Even more painful is > when Yarn/MapReduce APIs change so that customers not only have to > recompile, but also have to change hundreds of scripts / flows in order to > deal with the API change. This problem is compounded by other tools in the > Hadoop ecosystem that would have to deal with these changes. There would be > two different versions of Cascading, HBase, Hive, Pig, Spark, Tez, you name > it. > > g) Without proper classpath isolation, third party dependency changes > (guava, protobuf version, etc) are probably as painful as API changes. > > h) HDFS client API get changed in a backwards incompatible way requiring all > clients to change their code, recompile and re-start their services in a > coordinated way. We have tens of thousands of production servers reading > from / writing to Hadoop and cannot have all of these long running clients > restart at the same time. > > To put these in perspective, despite us being one > of the early adopters of Hadoop 2 in production at the scale of many > thousands of nodes, we are still wrapping up the migration from our last > Hadoop 1 clusters. We have many war stories about many of the above > incompatibilities. As I've tweeted about publicly the gains have been > significant with this migration to Hadoop 2, but the friction has also been > considerable. > > To get specific about JDK 8, we are intending to move to Java > 8. Right now we're letting clients choose to run tasks with JDK 8 > optionally, then we'll make it default. We'll switch to the daemons running > with JDK 8. What we're concerned it would then be feasible to use JDK 8 > features on the servers side (see c) above). > > I'm suggesting that if we do > allow backwards incompatible changes, we introduce an upgrade path through > an agreed upon stepping stone release. For example, a protocol changing from > thrift to protobuf can be done in steps. In the stepping-stone release both > would be accepted. in the following release (or two releases later) the > thrift version support is dropped.This would allow for a rolling upgrade, > or even if a cluster-wide restart is needed, at least customers can adopt > to the change at a pace of weeks or months. Once no more (important) > customers are running the thrift client, we could then roll to the next > release. It would be useful to coordinate the backwards incompatibilities so > that not every release becomes a stepping-stone release.