Re: Hadoop - Major releases

Colin P. McCabe Tue, 17 Mar 2015 10:49:38 -0700

Thanks, Andrew and Joep.

+1 for maintaining wire and API compatibility, but moving to JDK8 in 3.0


best,
Colin

On Mon, Mar 16, 2015 at 3:22 PM, Andrew Wang <andrew.w...@cloudera.com> wrote:
> I took the liberty of adding line breaks to Joep's mail.
>
> Thanks for the great feedback Joep. The goal with 3.x is to maintain API
> and wire compatibility with 2.x, which I think addresses most of your
> concerns. A 2.x client running on JDK7 would then still be able to talk to
> a 3.x server running on JDK8. Classpath isolation is also proposed as a
> banner feature, which directly addresses g). This might require new
> (major?) releases for some downstreams, but the feedback I've heard related
> to this has been very positive.
>
> Best,
> Andrew
>
> ==============
>
> It depends on the "Return on Pain". While it is hard to quantify the
> returns in the abstract, I can try to sketch out which kinds of changes are
> the most painful and therefore cause the most friction for us.In rough
> order of increasing pain to deal with:
>
> a) There is a new upstream (3.x)
> release, but it is so backwards incompatible, that we won't be able to
> adopt it for the foreseeable future. Even though we donâ€™t adopt it, it
> still causes pain. Now development becomes that much harder because we'd
> have to get a patch for trunk, a patch for 3.x and a patch for the 2.x
> branch. Conversely if patches go into 2.x only, now the releases start
> drifting apart. We already have (several dozen) patches in production that
> have not yet made it upstream, but are striving to keep this list as short
> as possible to reduce the rebase pain and risk.
>
> b) Central Daemons (RM, or
> pairs of HA NNs) have to be restarted causing a cluster-wide outage. The
> work towards work-preserving restart in progress in various areas makes
> these kinds of upgrades less painful.
>
> c) Server-side requires different
> runtime from client-side. We'd have to produce multiple artifacts, but we
> could make that work. For example, NN code uses Java 8 features, but
> clients can still use Java 7 to submit jobs and read/write HDFS.
>
> Now for the more painful backwards incompatibilities:
>
> d) All clients have to recompile
> (a token uses protobuf instead of thrift, an interface becomes an abstract
> class or vice versa). Not only do these kinds of changes make a rolling
> upgrade impossible, more importantly it requires all our clients to
> recompile their code and redeploy their production pipelines in a
> coordinated fashion. On top of this, we have multiple large production
> clusters and clients would have to keep multiple incompatible pipelines
> running, because we simply cannot upgrade all clusters in all datacenters
> at the same time.
>
> e) Customers are forced to restart and can no longer run
> with JDK 7 clients because job submission client code or HDFS has started
> using JDK 8-only features. Eventually group will reduce, but for at least
> another year if not more this will be very painful.
>
> f) Even more painful is
> when Yarn/MapReduce APIs change so that customers not only have to
> recompile, but also have to change hundreds of scripts / flows in order to
> deal with the API change. This problem is compounded by other tools in the
> Hadoop ecosystem that would have to deal with these changes. There would be
> two different versions of Cascading, HBase, Hive, Pig, Spark, Tez, you name
> it.
>
> g) Without proper classpath isolation, third party dependency changes
> (guava, protobuf version, etc) are probably as painful as API changes.
>
> h) HDFS client API get changed in a backwards incompatible way requiring all
> clients to change their code, recompile and re-start their services in a
> coordinated way. We have tens of thousands of production servers reading
> from / writing to Hadoop and cannot have all of these long running clients
> restart at the same time.
>
> To put these in perspective, despite us being one
> of the early adopters of Hadoop 2 in production at the scale of many
> thousands of nodes, we are still wrapping up the migration from our last
> Hadoop 1 clusters. We have many war stories about many of the above
> incompatibilities. As I've tweeted about publicly the gains have been
> significant with this migration to Hadoop 2, but the friction has also been
> considerable.
>
> To get specific about JDK 8, we are intending to move to Java
> 8. Right now we're letting clients choose to run tasks with JDK 8
> optionally, then we'll make it default. We'll switch to the daemons running
> with JDK 8. What we're concerned it would then be feasible to use JDK 8
> features on the servers side (see c) above).
>
> I'm suggesting that if we do
> allow backwards incompatible changes, we introduce an upgrade path through
> an agreed upon stepping stone release. For example, a protocol changing from
> thrift to protobuf can be done in steps. In the stepping-stone release both
> would be accepted. in the following release (or two releases later) the
> thrift version support is dropped.This would allow for a rolling upgrade,
> or even if a cluster-wide restart is needed, at least customers can adopt
> to the change at a pace of weeks or months. Once no more (important)
> customers are running the thrift client, we could then roll to the next
> release. It would be useful to coordinate the backwards incompatibilities so
> that not every release becomes a stepping-stone release.

Re: Hadoop - Major releases

Reply via email to