Re: Looking to a Hadoop 3 release

Andrew Wang Thu, 05 Mar 2015 17:26:23 -0800

I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
page. In addition to the two things I've been pushing, I also looked
through Allen's list (thanks Allen for making this) and picked out the
shell script rewrite and the removal of HFTP as big changes. This would be
the place to propose features for inclusion in 3.x, I'd particularly
appreciate help on the YARN/MR side.


Based on what I'm hearing, let me modulate my proposal to the following:

- We avoid cutting branch-3, and release off of trunk. The trunk-only
changes don't look that scary, so I think this is fine. This does mean we
need to be more rigorous before merging branches to trunk. I think
Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches would
be very helpful in this regard.
- We do not include anything to break wire compatibility unless (as Jason
says) it's an unbelievably awesome feature.
- No harm in rolling alphas from trunk, as it doesn't lock us to anything
compatibility wise. Downstreams like releases.

I'll take Steve's advice about not locking GA to a given date, but I also
share his belief that we can alpha/beta/GA faster than it took for Hadoop
2. Let's roll some intermediate releases, work on the roadmap items, and
see how we're feeling in a few months.

Best,
Andrew

On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth <ss...@apache.org> wrote:

> I think it'll be useful to have a discussion about what else people would
> like to see in Hadoop 3.x - especially if the change is potentially
> incompatible. Also, what we expect the release schedule to be for major
> releases and what triggers them - JVM version, major features, the need for
> incompatible changes ? Assuming major versions will not be released every 6
> months/1 year (adoption time, fairly disruptive for downstream projects,
> and users) -  considering additional features/incompatible changes for 3.x
> would be useful.
>
> Some features that come to mind immediately would be
> 1) enhancements to the RPC mechanics - specifically support for AsynRPC /
> two way communication. There's a lot of places where we re-use heartbeats
> to send more information than what would be done if the PRC layer supported
> these features. Some of this can be done in a compatible manner to the
> existing RPC sub-system. Others like 2 way communication probably cannot.
> After this, having HDFS/YARN actually make use of these changes. The other
> consideration is adoption of an alternate system ike gRpc which would be
> incompatible.
> 2) Simplification of configs - potentially separating client side configs
> and those used by daemons. This is another source of perpetual confusion
> for users.
>
> Thanks
> - Sid
>
>
> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <ste...@hortonworks.com>
> wrote:
>
> > Sorry, outlook dequoted Alejandros's comments.
> >
> > Let me try again with his comments in italic and proofreading of mine
> >
> > On 05/03/2015 13:59, "Steve Loughran" <ste...@hortonworks.com<mailto:
> > ste...@hortonworks.com>> wrote:
> >
> >
> >
> > On 05/03/2015 13:05, "Alejandro Abdelnur" <tuc...@gmail.com<mailto:
> > tuc...@gmail.com><mailto:tuc...@gmail.com>> wrote:
> >
> > IMO, if part of the community wants to take on the responsibility and
> work
> > that takes to do a new major release, we should not discourage them from
> > doing that.
> >
> > Having multiple major branches active is a standard practice.
> >
> > Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
> > long time to get out, and during that time 0.21, 0.22, got released and
> > ignored; 0.23 picked up and used in production.
> >
> > The 2.04-alpha release was more of a troublespot as it got picked up
> > widely enough to be used in products, and changes were made between that
> > alpha & 2.2 itself which raised compatibility issues.
> >
> > For 3.x I'd propose
> >
> >
> >   1.  Have less longevity of 3.x alpha/beta artifacts
> >   2.  Make clear there are no guarantees of compatibility from alpha/beta
> > releases to shipping. Best effort, but not to the extent that it gets in
> > the way. More succinctly: we will care more about seamless migration from
> > 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
> >   3.  Anybody who ships code based on 3.x alpha/beta to recognise and
> > accept policy (2). Hadoop's "instability guarantee" for the 3.x
> alpha/beta
> > phase
> >
> > As well as backwards compatibility, we need to think about Forwards
> > compatibility, with the goal being:
> >
> > Any app written/shipped with the 3.x release binaries (JAR and native)
> > will work in and against a 3.y Hadoop cluster, for all x, y in Natural
> > where y>=x  and is-release(x) and is-release(y)
> >
> > That's important, as it means all server-side changes in 3.x which are
> > expected to to mandate client-side updates: protocols, HDFS erasure
> > decoding, security features, must be considered complete and stable
> before
> > we can say is-release(x). In an ideal world, we'll even get the semantics
> > right with tests to show this.
> >
> > Fixing classpath hell downstream is certainly one feature I am +1 on.
> But:
> > it's only one of the features, and given there's not any design doc on
> that
> > JIRA, way too immature to set a release schedule on. An alpha schedule
> with
> > no-guarantees and a regular alpha roll, could be viable, as new features
> go
> > in and can then be used to experimentally try this stuff in branches of
> > Hbase (well volunteered, Stack!), etc. Of course instability guarantees
> > will be transitive downstream.
> >
> >
> > This time around we are not replacing the guts as we did from Hadoop 1 to
> > Hadoop 2, but superficial surgery to address issues were not considered
> (or
> > was too much to take on top of the guts transplant).
> >
> > For the split brain concern, we did a great of job maintaining Hadoop 1
> and
> > Hadoop 2 until Hadoop 1 faded away.
> >
> > And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
> > compatibility.
> >
> >
> > Based on that experience I would say that the coexistence of Hadoop 2 and
> > Hadoop 3 will be much less demanding/traumatic.
> >
> > The re-layout of all the source trees was a major change there, assuming
> > there's no refactoring or switch of build tools then picking things back
> > will be tractable
> >
> >
> > Also, to facilitate the coexistence we should limit Java language
> features
> > to Java 7 (even if the runtime is Java 8), once Java 7 is not used
> anymore
> > we can remove this limitation.
> >
> > +1; setting javac.version will fix this
> >
> > What is nice about having java 8 as the base JVM is that it means you can
> > be confident that all Hadoop 3 servers will be JDK8+, so downstream apps
> > and libs can use all Java 8 features they want to.
> >
> > There's one policy change to consider there which is possibly, just
> > possibly, we could allow new modules in hadoop-tools to adopt Java 8
> > languages early, provided everyone recognised that "backport to branch-2"
> > isn't going to happen.
> >
> > -Steve
> >
> >
>

Re: Looking to a Hadoop 3 release

Reply via email to