I've been hunting some flaky tests down as well -- a few weeks back I was testing some changes along the line of HBASE-4326. (maybe some of these are fixed?)
First, two test seemed to flake fairly frequently and were likely problems internal to the tests (TestReplication, TestMasterFailover). There is a second set of tests that after applying a draft of HBASE-4326, seems to moves to a different set of tests. I'm pretty convinced there are some cross test problems with these. This was on an 0.90.4 based branch, and by now several more changes have gone in. I'm getting back to HBASE-4326 and will try to get more stats on this. Alternately, I exclude tests that I identify as flaky and exclude them from the test run and have a separate test run that only runs the flaky tests. The hooks for the excludes build is in the hbase pom but only works with maven surefire 2.6 or 2.10 when it comes out. (there is a bug in surefire). See this jira for more details. http://jira.codehaus.org/browse/SUREFIRE-766 Jon. On Sun, Sep 25, 2011 at 2:27 PM, lars hofhansl <[email protected]> wrote: > At Salesforce we call these "flappers" and they are considered almost worse > than failing tests, > as they add noise to a test run without adding confidence. > At test that fails once in - say - 10 runs is worthless. > > > > ________________________________ > From: Ted Yu <[email protected]> > To: [email protected] > Sent: Sunday, September 25, 2011 1:41 PM > Subject: Re: maintaining stable HBase build > > As of 1:38 PST Sunday, the three builds all passed. > > I think we have some tests that exhibit in-deterministic behavior. > > I suggest committers interleave patch submissions by 2 hour span so that we > can more easily identify patch(es) that break the build. > > Thanks > > On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <[email protected]> wrote: > > > I wrote a short blog: > > http://zhihongyu.blogspot.com/2011/09/streamlining-patch-submission.html > > > > It is geared towards contributors. > > > > Cheers > > > > > > On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan 00902313 < > > [email protected]> wrote: > > > >> Hi > >> > >> Ted, I agree with you. Pasting the testcase results in JIRA is also > fine, > >> mainly when there are some testcase failures when we run locally but if > we > >> feel it is not due to the fix we have added we can mention that also. I > >> think rather than in a windows machine its better to run in linux box. > >> > >> +1 for your suggestion Ted. > >> > >> Can we add the feature like in HDFS when we submit patch automatically > the > >> Jenkin's run the testcases? > >> > >> Atleast till this is done I go with your suggestion. > >> > >> Regards > >> Ram > >> > >> ----- Original Message ----- > >> From: Ted Yu <[email protected]> > >> Date: Saturday, September 24, 2011 4:22 pm > >> Subject: maintaining stable HBase build > >> To: [email protected] > >> > >> > Hi, > >> > I want to bring the importance of maintaining stable HBase build to > >> > ourattention. > >> > A stable HBase build is important, not just for the next release > >> > but also > >> > for authors of the pending patches to verify the correctness of > >> > their work. > >> > > >> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds > >> > were all > >> > blue. Now they're all red. > >> > > >> > I don't mind fixing Jenkins build. But if we collectively adopt > >> > some good > >> > practice, it would be easier to achieve the goal of having stable > >> > builds. > >> > For contributors, I understand that it takes so much time to run > >> > whole test > >> > suite that he/she may not have the luxury of doing this - Apache > >> > Jenkinswouldn't do it when you press Submit Patch button. > >> > If this is the case (let's call it scenario A), please use Eclipse > >> > (or other > >> > tool) to identify tests that exercise the classes/methods in your > >> > patch and > >> > run them. Also clearly state what tests you ran in the JIRA. > >> > > >> > If you have a Linux box where you can run whole test suite, it > >> > would be nice > >> > to utilize such resource and run whole suite. Then please state > >> > this fact on > >> > the JIRA as well. > >> > Considering Todd's suggestion of holding off commit for 24 hours > >> > after code > >> > review, 2 hour test run isn't that long. > >> > > >> > Sometimes you may see the following (from 0.92 build 18): > >> > > >> > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21 > >> > > >> > [INFO] ------------------------------------------------------------- > >> > ----------- > >> > [INFO] BUILD FAILURE > >> > [INFO] ------------------------------------------------------------- > >> > ----------- > >> > [INFO] Total time: 1:51:41.797s > >> > > >> > You should examine the test summary above these lines and find out > >> > which test(s) hung. For this case it was TestMasterFailover: > >> > > >> > Running org.apache.hadoop.hbase.master.TestMasterFailover > >> > Running > >> > > org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTableTests > >> run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec > >> > > >> > I think a script should be developed that parses test output and > >> > identify hanging test(s). > >> > > >> > For scenario A, I hope committer would run test suite. > >> > The net effect would be a statement on the JIRA, saying all tests > >> > passed. > >> > Your comments/suggestions are welcome. > >> > > >> > > > > > -- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // [email protected]
