On unit tests and hudson (WAS -> Re: Hudson build is back to normal : HBase-TRUNK #1551)

Stack Fri, 15 Oct 2010 10:47:52 -0700

Yeah, I'm fixing it (smile).

Tests are almost back to normal.  There's still some flakeyness to
eradicate.  Almost there.


While there's a bit of a focus on tests, I'd like to petition that
going forward we do all we can to keep tests in the blue.  Here's why
(mostly informed by what I learned over the last week working with
hudson):

+ Hudson is always right.  If he fails a build, there is a cause.  The
cause of failure may be indecipherable, seemingly from the realm of
shadows, but digging will turn up the cause. Eventually.  Here's some
recent 'interesting' illustration:
++ Our TableOutputFormat has been broke, probably since the day it was
originally written more than a year (or two) ago in that it was not
reading the config. set by job setup.   This plus a test that was
leaving up a zookeeper ensemble -- yet to be found -- was root cause
of sporadic TestTableMapReduce failings.
++ Clients could always timeout their session on zookeeper especially
when the zk ensemble was restarted as part of a unit test
(TestClusterRestart).  A timed-out client hosts stale data; i.e. its
not updatable by watchers.  Up until the new master commit, these
session expirations were rarely troublesome; the stale data was
usually sufficient to complete the test successfully.  Failures were
rare but possible (With new master, there's more riding on zk watchers
working so lost session should be more obvious).

+ We can't let broke tests go unaddressed again.  If tests strike up a
failing pattern in hudson we all get lazy about running tests at all.
We lose the benefit unit tests bring where unit tests turn up the side
effects not considered.  While the new master checkin was responsible
for a portion of the failures of late, what has been interesting to me
is how many of the recent test fails were not related at all.  There
were tests that tested nought and failed (i.e. the putting up of two
HBaseTestingUtilities in the one JVM but this doesn't work yet so test
would hang on close), tests that were working under presumptions long
since abandoned (TestMergeMeta wanted to do exactly that, merge meta,
a facility we frustrated ad while back), or tests that had been broken
by a refactoring unrelated to new master (TestSplitTransaction had a
means of distinguishing testing from normal running that was broke).

+ A good few tests -- maybe 5 in the end -- were not completing when
the test suite was run and maven would step in and kill them.  These
tests prevented the tests behind them from running.  A few of these
were checked in tests that could never have worked.  For example,
TestDeadServers, a test I committed, was plain broke.  It could never
have worked.  It looks like I checked in a version that was
incomplete.  Or TestLoadBalancer was using an unresolvable hostname.
How could that ever have worked?  Because of the tests that were not
completing, hudson did not have a chance to flag the broke commits.
I've changed the timeout on tests so we'll cut in after 15 minutes.
Its also good practice to add the junit4 (timeout = N) qualification
to the @Test annotation.  Set it to 3 or 5 minutes or something.
Unfortunately, a bunch of our tests are still junit3 and the timeout
is not an option (that I know of).

St.Ack
P.S. I still love unit tests even when they are a pain.



On Fri, Oct 15, 2010 at 9:41 AM, Steven Noels <[email protected]> wrote:
> This must be a mistake. |-)
>
> On Fri, Oct 15, 2010 at 9:52 AM, Apache Hudson Server <
> [email protected]> wrote:
>
>> See <https://hudson.apache.org/hudson/job/HBase-TRUNK/1551/changes>
>>
>>
>>
>
>
> --
> Steven Noels
> http://outerthought.org/
> Open Source Content Applications
> Makers of Kauri, Daisy CMS and Lily
>

On unit tests and hudson (WAS -> Re: Hudson build is back to normal : HBase-TRUNK #1551)

Reply via email to