Rocking effort Stack!!!  Thanks.

Regards
Ram

On Fri, Nov 6, 2015 at 1:40 AM, Stack <st...@duboce.net> wrote:

> On Thu, Nov 5, 2015 at 8:07 AM, Andrew Purtell <andrew.purt...@gmail.com>
> wrote:
>
> > > Hanging tests have been fixed and or disabled to be put back after
> > scrubbing.
> >
> > What do you think about an interim step that adds a flakey test category
> > and a profile that disables them only on builds.a.o., i.e. the Jenkins
> job
> > configuration turns them off. Is that possible? I'd like to continue
> > running these on my build rigs since they are better endowed than
> build.a.o
> > resources. Or at least a profile that can turn them on?
> >
> >
> We could do such a thing. Probably better than the current hackery where
> the test is just disabled with JIRAs to fix ...sometime.
>
>
>
> > > This is a petition that we go out of our way going forward to keep OUR
> > test suite blue.
> >
> > Big +1 here
> >
> >
> Yeah. Its got to be a group thing.
>
>
>
> > BTW it turns out after seeing the results of your effort that most of my
> > issues with builds.a.o were probably due to the broken zombie killing
> > thing. That's why locally run stuff (also under Jenkins sometimes btw)
> was
> > just so much more stable. Can we have review and SCM of our build
> > configurations somehow going forward?
> >
> >
> Makes sense (and still work to do on zombie detector). Let me work on it.
> St.Ack
>
>
>
>
> >
> >
> >
> > > On Oct 23, 2015, at 2:54 PM, Stack <st...@duboce.net> wrote:
> > >
> > > A few of us have been doing cleanup over the last month or so (see
> > > HBASE-14420). As a project, we had let our unit test suite go to seed.
> It
> > > was an anthology of mysterious crashes, zombies and flakes.
> > >
> > > We are not done yet but tests are mostly stable again with patch builds
> > > passing close to 100% of the time as long as the patch is good and
> trunk
> > > and branch-1/branch-1.2 are tending back toward being blue always.
> > Hanging
> > > tests have been fixed and or disabled to be put back after scrubbing.
> > > Mysterious surefire crashes/timeouts have been addressed by purging a
> > > problematic test set that we intend to re-add after tuneup and fix.
> There
> > > are still a few flakies in the mix.
> > >
> > > This is a petition that we go out of our way going forward to keep OUR
> > test
> > > suite blue. We'll all be more productive if we can keep it this way.
> > > Patches will land faster because there'll be less friction getting them
> > in
> > > (Landing big patches was taking me a week before starting in on this
> > > effort). We'll catch a slew of problems before commit. New devs won't
> be
> > > confounded by mysterious unrelated test fails. There'll be no need to
> > keep
> > > up an arcane knowledge of 'known flakies' or hanging tests or the need
> > for
> > > expending extra effort and resources doing
> 'look-it-works-locally-for-me'
> > > test runs locally.
> > >
> > > St.Ack
> > >
> > > Below are some further notes for those interested in build and work
> done
> > to
> > > our test rig recently; ugly detail is over in HBASE-14420.
> > >
> > > Until an alternative shows up, our Apache Jenkins needs to run blue
> > always
> > > if we want to do community development. True, Apache Jenkins is a
> trying
> > > environment in which to run tests, but it is shared, public, and I have
> > yet
> > > to come across a hang or failure that was Apache-Jenkins-only; the only
> > > difference I've seen is that the incidence of hangs and flakies is
> higher
> > > on Apache.
> > >
> > > The test-patch.sh script had some hacking done to it mostly removing
> code
> > > that was finding and killing zombies. We were reporting ANY concurrent
> > > build as a zombie, even those that were not hbase tests, and killing
> them
> > > in the belief that they were leftovers from previous runs (the script
> > had a
> > > few different techniques for finding and executing adjacent processes).
> > > This made some sense when we were supposed to be the only test running
> on
> > > the box but this has not been true for a long time. Killing was
> > > papering-over the fact that we were leaving zombies after us.
> > >
> > > The Jenkins build configuration also had zombie code from test-patch.sh
> > in
> > > it (still does -- a TODO). Builds now dump out test machine load and
> > > listing of what else is running on the box at test start to give a
> sense
> > of
> > > how loaded the test box is.
> > >
> > > I feel particularly bad for the new contributors. They have it hard
> > enough
> > > already checking out a fat project with a slow build system with hours
> of
> > > tests to run to verify changes. Lets spare them the added barrier of a
> > > confounding experience when their nice patch throws up a mysterious
> > jenkins
> > > fail on submit.
> >
>

Reply via email to