Re: maintaining stable HBase build

lars hofhansl Mon, 26 Sep 2011 10:45:51 -0700

I was thinking more along the lines:
Either fix the test to not flap, or remove it.



The first task would be to identify all tests that frequently show 
non-deterministic results.



________________________________
From: Ted Yu <[email protected]>
To: [email protected]; lars hofhansl <[email protected]>
Sent: Monday, September 26, 2011 2:08 AM
Subject: Re: maintaining stable HBase build


Below is a simple script to repeatedly run a unit test.
I suggest using it or similar script on the new unit test(s) in future patches.

#!/bin/bash
# script to run test repeatedly
# usage: ./runtest.sh <name of test> <number of repetitions>
#
for ((  i = 1 ;  i <= $2; i++  ))
do
  nice -10 mvn test -Dtest=$1
  if [ $? -ne 0 ]; then
    echo "$1 failed"
    exit 1
  fi
done

Thanks


On Sun, Sep 25, 2011 at 2:27 PM, lars hofhansl <[email protected]> wrote:

At Salesforce we call these "flappers" and they are considered almost worse 
than failing tests,
>as they add noise to a test run without adding confidence.
>At test that fails once in - say - 10 runs is worthless.
>
>
>
>________________________________
>
>From: Ted Yu <[email protected]>
>
>To: [email protected]
>Sent: Sunday, September 25, 2011 1:41 PM
>
>Subject: Re: maintaining stable HBase build
>
>
>As of 1:38 PST Sunday, the three builds all passed.
>
>I think we have some tests that exhibit in-deterministic behavior.
>
>I suggest committers interleave patch submissions by 2 hour span so that we
>can more easily identify patch(es) that break the build.
>
>Thanks
>
>On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <[email protected]> wrote:
>
>> I wrote a short blog:
>> http://zhihongyu.blogspot.com/2011/09/streamlining-patch-submission.html
>>
>> It is geared towards contributors.
>>
>> Cheers
>>
>>
>> On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan 00902313 <
>> [email protected]> wrote:
>>
>>> Hi
>>>
>>> Ted, I agree with you.  Pasting the testcase results in JIRA is also fine,
>>> mainly when there are some testcase failures when we run locally but if we
>>> feel it is not due to the fix we have added we can mention that also.  I
>>> think rather than in a windows machine its better to run in linux box.
>>>
>>> +1 for your suggestion Ted.
>>>
>>> Can we add the feature like in HDFS when we submit patch automatically the
>>> Jenkin's run the testcases?
>>>
>>> Atleast till this is done I go with your suggestion.
>>>
>>> Regards
>>> Ram
>>>
>>> ----- Original Message -----
>>> From: Ted Yu <[email protected]>
>>> Date: Saturday, September 24, 2011 4:22 pm
>>> Subject: maintaining stable HBase build
>>> To: [email protected]
>>>
>>> > Hi,
>>> > I want to bring the importance of maintaining stable HBase build to
>>> > ourattention.
>>> > A stable HBase build is important, not just for the next release
>>> > but also
>>> > for authors of the pending patches to verify the correctness of
>>> > their work.
>>> >
>>> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds
>>> > were all
>>> > blue. Now they're all red.
>>> >
>>> > I don't mind fixing Jenkins build. But if we collectively adopt
>>> > some good
>>> > practice, it would be easier to achieve the goal of having stable
>>> > builds.
>>> > For contributors, I understand that it takes so much time to run
>>> > whole test
>>> > suite that he/she may not have the luxury of doing this - Apache
>>> > Jenkinswouldn't do it when you press Submit Patch button.
>>> > If this is the case (let's call it scenario A), please use Eclipse
>>> > (or other
>>> > tool) to identify tests that exercise the classes/methods in your
>>> > patch and
>>> > run them. Also clearly state what tests you ran in the JIRA.
>>> >
>>> > If you have a Linux box where you can run whole test suite, it
>>> > would be nice
>>> > to utilize such resource and run whole suite. Then please state
>>> > this fact on
>>> > the JIRA as well.
>>> > Considering Todd's suggestion of holding off commit for 24 hours
>>> > after code
>>> > review, 2 hour test run isn't that long.
>>> >
>>> > Sometimes you may see the following (from 0.92 build 18):
>>> >
>>> > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21
>>> >
>>> > [INFO] -------------------------------------------------------------
>>> > -----------
>>> > [INFO] BUILD FAILURE
>>> > [INFO] -------------------------------------------------------------
>>> > -----------
>>> > [INFO] Total time: 1:51:41.797s
>>> >
>>> > You should examine the test summary above these lines and find out
>>> > which test(s) hung. For this case it was TestMasterFailover:
>>> >
>>> > Running org.apache.hadoop.hbase.master.TestMasterFailover
>>> > Running
>>> > org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTableTests
>>> run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec
>>> >
>>> > I think a script should be developed that parses test output and
>>> > identify hanging test(s).
>>> >
>>> > For scenario A, I hope committer would run test suite.
>>> > The net effect would be a statement on the JIRA, saying all tests
>>> > passed.
>>> > Your comments/suggestions are welcome.
>>> >
>>>
>>
>>

Re: maintaining stable HBase build

Reply via email to