Re: [VOTE] Release plan for Hadoop 2.0.5

Steve Loughran Mon, 06 May 2013 10:43:52 -0700

On 4 May 2013 18:38, Roman Shaposhnik <[email protected]> wrote:
> On Sat, May 4, 2013 at 3:33 PM, Tsz Wo Sze <[email protected]> wrote:
>> The proposal sounds like an ideal solution but it is impractical.
>> I think it is hard to make all API changes now and freezing them.
>> Either it will just take a long time to finish the API changes, or
>> we may miss some important API changes.
>
> In fact this was entire point of my comment wrt. high degree of
> focus towards downstream components of anything that could
> be potentially called a Hadoop beta release.
>
> The reality of the situation that we can't simply wish away is
> that Hadoop is not Java -- it doesn't have a formal testsuite
> along the lines of the TCK that can guarantee API stability.


I'm working on filesystem stuff, but that's only tested at the unit
test level, "does it match the requirements", which is different from
"does it work with Pig".

> We don't have that. Hence we might as well use the next
> best thing -- tons of code implemented downstream that
> actually exercise Hadoop APIs.

I'm in favour of this, and do want to tease out some of the swift-only
scale tests for a bigtop patch that we can apply to all filesystems
-they are good at finding bugs. For example, if you create a few
thousand files in a blobstore then try to delete them, that's when you
discover the RESTy endpoints throttle the operations and your code
starts failing unless you insert self-throttling and extend socket
timeouts.  Those are the fun things you want to find sooner rather
than later.

>
> Perhaps a shift of perspective is needed on the part of
> Hadoop community -- we should stop looking at downstream
> as just downstream and start looking at it as a de-facto
> TCK. If we assume that vantage point then things like
> making sure that there are regular Unit tests runs clearly
> become something that useful to Hadoop directly, not
> a 'downstream problem'.

This is something we need to start working on, with Jenkins picking up
problems sooner rather than later.  for it to work -and I say this
based on experience in multi-team projects- everyone needs to care
about problems downstream.

 What I don't want to do -and am -voting against, is Kos's proposal:

1 (non-binding)

There are some things in there -windows support- that shouldn't be
reverted as Kos wants, and I don't want to go near the Hadoop
numbering scheme again. 2.x is 2.x and let's not change our mind now.

As Nicholas says, we need to get some of the protocols and APIs stable
so they can stay forwards compatible. We've also had problems with
HDFS changes in the 2.0 alphas; those wire things need to be locked
down as well as -some- of the APIs. I'm (mostly) happy with management
layer stuff changing, but not the code that users and downstream apps
use.

That doesn't mean 2.x needs to be feature complete, only that everyone
in the core hadoop team is happy that it's got the hooks in there for
the roadmap, and make sure that HDFS v2 is stable and trusted. Because
if you look at the rate of change of Linux, you can say "they
alternate stable/unstable every six months", but the development of
ext4 is much more cautious -it has the same requirement as HDFS:
preserve data you care about.


We should get it out the door, then start to get that bigtop stack
doing the nightly build/test, with failures being reported back to
hadoop-common as well as bigtop. As a reference point, Apache gump was
the first "build everything from SCM" Java project in the ASF,
checking out everything, bootstrapping Ant off its build.sh script and
the JDK, then going into the XML parsers and logging, before trying to
do the rest of the DAG of Java projects. Its goal was portrayed as
"nightly build and tests for your project", but the Ant team viewed it
as our regression tests. Sadly Maven killed it by making it
near-impossible to build an up to date dependency graph.

If we have HDFS stable then YARN and MR can roll at a faster rate,
along with the other downstream parts.

One other thing I want to pick up on is that testing "at Y! scale"
isn't sufficient to get all bugs ironed out -if you look at what
surfaces outside its the people whose networks are a badly configured
mess (mine included), virtualised with very, very odd networks, clock
drift and RAM paging out under the OS's radar, or very different uses.

Those are the problems that will show up in the -beta phase, precisely
because -alpha isn't viewed as safe/stable. I expect some fun bugs to
surface there -bugs we aren't going to see in bigtop either, because
the kind of person who is trying to set up a cluster using the IP
address pool 127.0.1.* isn't the kind of person who is going to check
out bigtop and run nightly tests.

-Steve

Re: [VOTE] Release plan for Hadoop 2.0.5

Reply via email to