Thanks for that! Once the build is green, it becomes much easier to keep it so.
-ryan On Fri, Oct 15, 2010 at 10:47 AM, Stack <[email protected]> wrote: > Yeah, I'm fixing it (smile). > > Tests are almost back to normal. There's still some flakeyness to > eradicate. Almost there. > > While there's a bit of a focus on tests, I'd like to petition that > going forward we do all we can to keep tests in the blue. Here's why > (mostly informed by what I learned over the last week working with > hudson): > > + Hudson is always right. If he fails a build, there is a cause. The > cause of failure may be indecipherable, seemingly from the realm of > shadows, but digging will turn up the cause. Eventually. Here's some > recent 'interesting' illustration: > ++ Our TableOutputFormat has been broke, probably since the day it was > originally written more than a year (or two) ago in that it was not > reading the config. set by job setup. This plus a test that was > leaving up a zookeeper ensemble -- yet to be found -- was root cause > of sporadic TestTableMapReduce failings. > ++ Clients could always timeout their session on zookeeper especially > when the zk ensemble was restarted as part of a unit test > (TestClusterRestart). A timed-out client hosts stale data; i.e. its > not updatable by watchers. Up until the new master commit, these > session expirations were rarely troublesome; the stale data was > usually sufficient to complete the test successfully. Failures were > rare but possible (With new master, there's more riding on zk watchers > working so lost session should be more obvious). > > + We can't let broke tests go unaddressed again. If tests strike up a > failing pattern in hudson we all get lazy about running tests at all. > We lose the benefit unit tests bring where unit tests turn up the side > effects not considered. While the new master checkin was responsible > for a portion of the failures of late, what has been interesting to me > is how many of the recent test fails were not related at all. There > were tests that tested nought and failed (i.e. the putting up of two > HBaseTestingUtilities in the one JVM but this doesn't work yet so test > would hang on close), tests that were working under presumptions long > since abandoned (TestMergeMeta wanted to do exactly that, merge meta, > a facility we frustrated ad while back), or tests that had been broken > by a refactoring unrelated to new master (TestSplitTransaction had a > means of distinguishing testing from normal running that was broke). > > + A good few tests -- maybe 5 in the end -- were not completing when > the test suite was run and maven would step in and kill them. These > tests prevented the tests behind them from running. A few of these > were checked in tests that could never have worked. For example, > TestDeadServers, a test I committed, was plain broke. It could never > have worked. It looks like I checked in a version that was > incomplete. Or TestLoadBalancer was using an unresolvable hostname. > How could that ever have worked? Because of the tests that were not > completing, hudson did not have a chance to flag the broke commits. > I've changed the timeout on tests so we'll cut in after 15 minutes. > Its also good practice to add the junit4 (timeout = N) qualification > to the @Test annotation. Set it to 3 or 5 minutes or something. > Unfortunately, a bunch of our tests are still junit3 and the timeout > is not an option (that I know of). > > St.Ack > P.S. I still love unit tests even when they are a pain. > > > > On Fri, Oct 15, 2010 at 9:41 AM, Steven Noels <[email protected]> > wrote: >> This must be a mistake. |-) >> >> On Fri, Oct 15, 2010 at 9:52 AM, Apache Hudson Server < >> [email protected]> wrote: >> >>> See <https://hudson.apache.org/hudson/job/HBase-TRUNK/1551/changes> >>> >>> >>> >> >> >> -- >> Steven Noels >> http://outerthought.org/ >> Open Source Content Applications >> Makers of Kauri, Daisy CMS and Lily >> >
