Yeah, I'm fixing it (smile). Tests are almost back to normal. There's still some flakeyness to eradicate. Almost there.
While there's a bit of a focus on tests, I'd like to petition that going forward we do all we can to keep tests in the blue. Here's why (mostly informed by what I learned over the last week working with hudson): + Hudson is always right. If he fails a build, there is a cause. The cause of failure may be indecipherable, seemingly from the realm of shadows, but digging will turn up the cause. Eventually. Here's some recent 'interesting' illustration: ++ Our TableOutputFormat has been broke, probably since the day it was originally written more than a year (or two) ago in that it was not reading the config. set by job setup. This plus a test that was leaving up a zookeeper ensemble -- yet to be found -- was root cause of sporadic TestTableMapReduce failings. ++ Clients could always timeout their session on zookeeper especially when the zk ensemble was restarted as part of a unit test (TestClusterRestart). A timed-out client hosts stale data; i.e. its not updatable by watchers. Up until the new master commit, these session expirations were rarely troublesome; the stale data was usually sufficient to complete the test successfully. Failures were rare but possible (With new master, there's more riding on zk watchers working so lost session should be more obvious). + We can't let broke tests go unaddressed again. If tests strike up a failing pattern in hudson we all get lazy about running tests at all. We lose the benefit unit tests bring where unit tests turn up the side effects not considered. While the new master checkin was responsible for a portion of the failures of late, what has been interesting to me is how many of the recent test fails were not related at all. There were tests that tested nought and failed (i.e. the putting up of two HBaseTestingUtilities in the one JVM but this doesn't work yet so test would hang on close), tests that were working under presumptions long since abandoned (TestMergeMeta wanted to do exactly that, merge meta, a facility we frustrated ad while back), or tests that had been broken by a refactoring unrelated to new master (TestSplitTransaction had a means of distinguishing testing from normal running that was broke). + A good few tests -- maybe 5 in the end -- were not completing when the test suite was run and maven would step in and kill them. These tests prevented the tests behind them from running. A few of these were checked in tests that could never have worked. For example, TestDeadServers, a test I committed, was plain broke. It could never have worked. It looks like I checked in a version that was incomplete. Or TestLoadBalancer was using an unresolvable hostname. How could that ever have worked? Because of the tests that were not completing, hudson did not have a chance to flag the broke commits. I've changed the timeout on tests so we'll cut in after 15 minutes. Its also good practice to add the junit4 (timeout = N) qualification to the @Test annotation. Set it to 3 or 5 minutes or something. Unfortunately, a bunch of our tests are still junit3 and the timeout is not an option (that I know of). St.Ack P.S. I still love unit tests even when they are a pain. On Fri, Oct 15, 2010 at 9:41 AM, Steven Noels <[email protected]> wrote: > This must be a mistake. |-) > > On Fri, Oct 15, 2010 at 9:52 AM, Apache Hudson Server < > [email protected]> wrote: > >> See <https://hudson.apache.org/hudson/job/HBase-TRUNK/1551/changes> >> >> >> > > > -- > Steven Noels > http://outerthought.org/ > Open Source Content Applications > Makers of Kauri, Daisy CMS and Lily >
