Re: Looking to a Hadoop 3 release

Alejandro Abdelnur Thu, 05 Mar 2015 22:27:03 -0800

If classloader isolation is in place, then dependency versions can freely
be upgraded as won't pollute apps space (things get trickier if there is an
ON/OFF switch).


On Thu, Mar 5, 2015 at 9:21 PM, Allen Wittenauer <[email protected]> wrote:

>
> Is there going to be a general upgrade of dependencies?  I'm thinking of
> jetty & jackson in particular.
>
> On Mar 5, 2015, at 5:24 PM, Andrew Wang <[email protected]> wrote:
>
> > I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
> > page. In addition to the two things I've been pushing, I also looked
> > through Allen's list (thanks Allen for making this) and picked out the
> > shell script rewrite and the removal of HFTP as big changes. This would
> be
> > the place to propose features for inclusion in 3.x, I'd particularly
> > appreciate help on the YARN/MR side.
> >
> > Based on what I'm hearing, let me modulate my proposal to the following:
> >
> > - We avoid cutting branch-3, and release off of trunk. The trunk-only
> > changes don't look that scary, so I think this is fine. This does mean we
> > need to be more rigorous before merging branches to trunk. I think
> > Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches
> would
> > be very helpful in this regard.
> > - We do not include anything to break wire compatibility unless (as Jason
> > says) it's an unbelievably awesome feature.
> > - No harm in rolling alphas from trunk, as it doesn't lock us to anything
> > compatibility wise. Downstreams like releases.
> >
> > I'll take Steve's advice about not locking GA to a given date, but I also
> > share his belief that we can alpha/beta/GA faster than it took for Hadoop
> > 2. Let's roll some intermediate releases, work on the roadmap items, and
> > see how we're feeling in a few months.
> >
> > Best,
> > Andrew
> >
> > On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth <[email protected]> wrote:
> >
> >> I think it'll be useful to have a discussion about what else people
> would
> >> like to see in Hadoop 3.x - especially if the change is potentially
> >> incompatible. Also, what we expect the release schedule to be for major
> >> releases and what triggers them - JVM version, major features, the need
> for
> >> incompatible changes ? Assuming major versions will not be released
> every 6
> >> months/1 year (adoption time, fairly disruptive for downstream projects,
> >> and users) -  considering additional features/incompatible changes for
> 3.x
> >> would be useful.
> >>
> >> Some features that come to mind immediately would be
> >> 1) enhancements to the RPC mechanics - specifically support for AsynRPC
> /
> >> two way communication. There's a lot of places where we re-use
> heartbeats
> >> to send more information than what would be done if the PRC layer
> supported
> >> these features. Some of this can be done in a compatible manner to the
> >> existing RPC sub-system. Others like 2 way communication probably
> cannot.
> >> After this, having HDFS/YARN actually make use of these changes. The
> other
> >> consideration is adoption of an alternate system ike gRpc which would be
> >> incompatible.
> >> 2) Simplification of configs - potentially separating client side
> configs
> >> and those used by daemons. This is another source of perpetual confusion
> >> for users.
> >>
> >> Thanks
> >> - Sid
> >>
> >>
> >> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <[email protected]>
> >> wrote:
> >>
> >>> Sorry, outlook dequoted Alejandros's comments.
> >>>
> >>> Let me try again with his comments in italic and proofreading of mine
> >>>
> >>> On 05/03/2015 13:59, "Steve Loughran" <[email protected]<mailto:
> >>> [email protected]>> wrote:
> >>>
> >>>
> >>>
> >>> On 05/03/2015 13:05, "Alejandro Abdelnur" <[email protected]<mailto:
> >>> [email protected]><mailto:[email protected]>> wrote:
> >>>
> >>> IMO, if part of the community wants to take on the responsibility and
> >> work
> >>> that takes to do a new major release, we should not discourage them
> from
> >>> doing that.
> >>>
> >>> Having multiple major branches active is a standard practice.
> >>>
> >>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
> >>> long time to get out, and during that time 0.21, 0.22, got released and
> >>> ignored; 0.23 picked up and used in production.
> >>>
> >>> The 2.04-alpha release was more of a troublespot as it got picked up
> >>> widely enough to be used in products, and changes were made between
> that
> >>> alpha & 2.2 itself which raised compatibility issues.
> >>>
> >>> For 3.x I'd propose
> >>>
> >>>
> >>>  1.  Have less longevity of 3.x alpha/beta artifacts
> >>>  2.  Make clear there are no guarantees of compatibility from
> alpha/beta
> >>> releases to shipping. Best effort, but not to the extent that it gets
> in
> >>> the way. More succinctly: we will care more about seamless migration
> from
> >>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
> >>>  3.  Anybody who ships code based on 3.x alpha/beta to recognise and
> >>> accept policy (2). Hadoop's "instability guarantee" for the 3.x
> >> alpha/beta
> >>> phase
> >>>
> >>> As well as backwards compatibility, we need to think about Forwards
> >>> compatibility, with the goal being:
> >>>
> >>> Any app written/shipped with the 3.x release binaries (JAR and native)
> >>> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
> >>> where y>=x  and is-release(x) and is-release(y)
> >>>
> >>> That's important, as it means all server-side changes in 3.x which are
> >>> expected to to mandate client-side updates: protocols, HDFS erasure
> >>> decoding, security features, must be considered complete and stable
> >> before
> >>> we can say is-release(x). In an ideal world, we'll even get the
> semantics
> >>> right with tests to show this.
> >>>
> >>> Fixing classpath hell downstream is certainly one feature I am +1 on.
> >> But:
> >>> it's only one of the features, and given there's not any design doc on
> >> that
> >>> JIRA, way too immature to set a release schedule on. An alpha schedule
> >> with
> >>> no-guarantees and a regular alpha roll, could be viable, as new
> features
> >> go
> >>> in and can then be used to experimentally try this stuff in branches of
> >>> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
> >>> will be transitive downstream.
> >>>
> >>>
> >>> This time around we are not replacing the guts as we did from Hadoop 1
> to
> >>> Hadoop 2, but superficial surgery to address issues were not considered
> >> (or
> >>> was too much to take on top of the guts transplant).
> >>>
> >>> For the split brain concern, we did a great of job maintaining Hadoop 1
> >> and
> >>> Hadoop 2 until Hadoop 1 faded away.
> >>>
> >>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
> >>> compatibility.
> >>>
> >>>
> >>> Based on that experience I would say that the coexistence of Hadoop 2
> and
> >>> Hadoop 3 will be much less demanding/traumatic.
> >>>
> >>> The re-layout of all the source trees was a major change there,
> assuming
> >>> there's no refactoring or switch of build tools then picking things
> back
> >>> will be tractable
> >>>
> >>>
> >>> Also, to facilitate the coexistence we should limit Java language
> >> features
> >>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used
> >> anymore
> >>> we can remove this limitation.
> >>>
> >>> +1; setting javac.version will fix this
> >>>
> >>> What is nice about having java 8 as the base JVM is that it means you
> can
> >>> be confident that all Hadoop 3 servers will be JDK8+, so downstream
> apps
> >>> and libs can use all Java 8 features they want to.
> >>>
> >>> There's one policy change to consider there which is possibly, just
> >>> possibly, we could allow new modules in hadoop-tools to adopt Java 8
> >>> languages early, provided everyone recognised that "backport to
> branch-2"
> >>> isn't going to happen.
> >>>
> >>> -Steve
> >>>
> >>>
> >>
>
>

Re: Looking to a Hadoop 3 release

Reply via email to