I was thinking more along the lines: Either fix the test to not flap, or remove it.
The first task would be to identify all tests that frequently show non-deterministic results. ________________________________ From: Ted Yu <[email protected]> To: [email protected]; lars hofhansl <[email protected]> Sent: Monday, September 26, 2011 2:08 AM Subject: Re: maintaining stable HBase build Below is a simple script to repeatedly run a unit test. I suggest using it or similar script on the new unit test(s) in future patches. #!/bin/bash # script to run test repeatedly # usage: ./runtest.sh <name of test> <number of repetitions> # for (( i = 1 ; i <= $2; i++ )) do nice -10 mvn test -Dtest=$1 if [ $? -ne 0 ]; then echo "$1 failed" exit 1 fi done Thanks On Sun, Sep 25, 2011 at 2:27 PM, lars hofhansl <[email protected]> wrote: At Salesforce we call these "flappers" and they are considered almost worse than failing tests, >as they add noise to a test run without adding confidence. >At test that fails once in - say - 10 runs is worthless. > > > >________________________________ > >From: Ted Yu <[email protected]> > >To: [email protected] >Sent: Sunday, September 25, 2011 1:41 PM > >Subject: Re: maintaining stable HBase build > > >As of 1:38 PST Sunday, the three builds all passed. > >I think we have some tests that exhibit in-deterministic behavior. > >I suggest committers interleave patch submissions by 2 hour span so that we >can more easily identify patch(es) that break the build. > >Thanks > >On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <[email protected]> wrote: > >> I wrote a short blog: >> http://zhihongyu.blogspot.com/2011/09/streamlining-patch-submission.html >> >> It is geared towards contributors. >> >> Cheers >> >> >> On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan 00902313 < >> [email protected]> wrote: >> >>> Hi >>> >>> Ted, I agree with you. Pasting the testcase results in JIRA is also fine, >>> mainly when there are some testcase failures when we run locally but if we >>> feel it is not due to the fix we have added we can mention that also. I >>> think rather than in a windows machine its better to run in linux box. >>> >>> +1 for your suggestion Ted. >>> >>> Can we add the feature like in HDFS when we submit patch automatically the >>> Jenkin's run the testcases? >>> >>> Atleast till this is done I go with your suggestion. >>> >>> Regards >>> Ram >>> >>> ----- Original Message ----- >>> From: Ted Yu <[email protected]> >>> Date: Saturday, September 24, 2011 4:22 pm >>> Subject: maintaining stable HBase build >>> To: [email protected] >>> >>> > Hi, >>> > I want to bring the importance of maintaining stable HBase build to >>> > ourattention. >>> > A stable HBase build is important, not just for the next release >>> > but also >>> > for authors of the pending patches to verify the correctness of >>> > their work. >>> > >>> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds >>> > were all >>> > blue. Now they're all red. >>> > >>> > I don't mind fixing Jenkins build. But if we collectively adopt >>> > some good >>> > practice, it would be easier to achieve the goal of having stable >>> > builds. >>> > For contributors, I understand that it takes so much time to run >>> > whole test >>> > suite that he/she may not have the luxury of doing this - Apache >>> > Jenkinswouldn't do it when you press Submit Patch button. >>> > If this is the case (let's call it scenario A), please use Eclipse >>> > (or other >>> > tool) to identify tests that exercise the classes/methods in your >>> > patch and >>> > run them. Also clearly state what tests you ran in the JIRA. >>> > >>> > If you have a Linux box where you can run whole test suite, it >>> > would be nice >>> > to utilize such resource and run whole suite. Then please state >>> > this fact on >>> > the JIRA as well. >>> > Considering Todd's suggestion of holding off commit for 24 hours >>> > after code >>> > review, 2 hour test run isn't that long. >>> > >>> > Sometimes you may see the following (from 0.92 build 18): >>> > >>> > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21 >>> > >>> > [INFO] ------------------------------------------------------------- >>> > ----------- >>> > [INFO] BUILD FAILURE >>> > [INFO] ------------------------------------------------------------- >>> > ----------- >>> > [INFO] Total time: 1:51:41.797s >>> > >>> > You should examine the test summary above these lines and find out >>> > which test(s) hung. For this case it was TestMasterFailover: >>> > >>> > Running org.apache.hadoop.hbase.master.TestMasterFailover >>> > Running >>> > org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTableTests >>> run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec >>> > >>> > I think a script should be developed that parses test output and >>> > identify hanging test(s). >>> > >>> > For scenario A, I hope committer would run test suite. >>> > The net effect would be a statement on the JIRA, saying all tests >>> > passed. >>> > Your comments/suggestions are welcome. >>> > >>> >> >>
