Hi Ted Yes we need to investigate hanging tests seperately. Regards Ram
-----Original Message----- From: Ted Yu [mailto:[email protected]] Sent: Tuesday, September 27, 2011 12:05 AM To: [email protected] Subject: Re: maintaining stable HBase build >> we can kill the java processes that are hanging if any testcases hangs. I think it is very important to find out why certain tests hang. Obtaining jstack is the first step in terms of investigation. Regards On Mon, Sep 26, 2011 at 11:31 AM, Ramakrishna S Vasudevan 00902313 < [email protected]> wrote: > Hi > > Just wanted to share one thing that i learnt today in maven for running > testcases. > > May be many will be knowing. > > We usually face problems like when we run testcases as a bunch few gets > failed due to system problems or improper clean up of previous testcases. > > As Jon suggested we can seperate out flaky test cases from the correct > ones. > > In maven we have a facility called profiles. > We can add the testcases that we have seperated out seperately(may be in 2 > to 3 batches) and add it to seperate profiles. > > We can invoke these profiles like mvn test -P "profileid". > > We can right a script that executes every profile and inbetween executing > every profile we can kill the java processes that are hanging if any > testcases hangs. > Just a suggestion. If you feel it suits you for some needs in any of your > project work you can use it. > > Regards > Ram > > > > ----- Original Message ----- > From: Jonathan Hsieh <[email protected]> > Date: Monday, September 26, 2011 11:15 pm > Subject: Re: maintaining stable HBase build > To: [email protected], lars hofhansl <[email protected]> > > > I've been hunting some flaky tests down as well -- a few weeks back > > I was > > testing some changes along the line of HBASE-4326. (maybe some of > > these are > > fixed?) > > > > First, two test seemed to flake fairly frequently and were likely > > problemsinternal to the tests (TestReplication, TestMasterFailover). > > > > There is a second set of tests that after applying a draft of HBASE- > > 4326,seems to moves to a different set of tests. I'm pretty > > convinced there are > > some cross test problems with these. This was on an 0.90.4 based > > branch, and > > by now several more changes have gone in. I'm getting back to > > HBASE-4326 > > and will try to get more stats on this. > > > > Alternately, I exclude tests that I identify as flaky and exclude > > them from > > the test run and have a separate test run that only runs the flaky > > tests. The hooks for the excludes build is in the hbase pom but > > only works > > with maven surefire 2.6 or 2.10 when it comes out. (there is a bug in > > surefire). See this jira for more details. > > http://jira.codehaus.org/browse/SUREFIRE-766 > > > > Jon. > > > > On Sun, Sep 25, 2011 at 2:27 PM, lars hofhansl > > <[email protected]> wrote: > > > > > At Salesforce we call these "flappers" and they are considered > > almost worse > > > than failing tests, > > > as they add noise to a test run without adding confidence. > > > At test that fails once in - say - 10 runs is worthless. > > > > > > > > > > > > ________________________________ > > > From: Ted Yu <[email protected]> > > > To: [email protected] > > > Sent: Sunday, September 25, 2011 1:41 PM > > > Subject: Re: maintaining stable HBase build > > > > > > As of 1:38 PST Sunday, the three builds all passed. > > > > > > I think we have some tests that exhibit in-deterministic behavior. > > > > > > I suggest committers interleave patch submissions by 2 hour span > > so that we > > > can more easily identify patch(es) that break the build. > > > > > > Thanks > > > > > > On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <[email protected]> wrote: > > > > > > > I wrote a short blog: > > > > http://zhihongyu.blogspot.com/2011/09/streamlining-patch- > > submission.html> > > > > > It is geared towards contributors. > > > > > > > > Cheers > > > > > > > > > > > > On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan > > 00902313 < > > > > [email protected]> wrote: > > > > > > > >> Hi > > > >> > > > >> Ted, I agree with you. Pasting the testcase results in JIRA > > is also > > > fine, > > > >> mainly when there are some testcase failures when we run > > locally but if > > > we > > > >> feel it is not due to the fix we have added we can mention > > that also. I > > > >> think rather than in a windows machine its better to run in > > linux box. > > > >> > > > >> +1 for your suggestion Ted. > > > >> > > > >> Can we add the feature like in HDFS when we submit patch > > automatically> the > > > >> Jenkin's run the testcases? > > > >> > > > >> Atleast till this is done I go with your suggestion. > > > >> > > > >> Regards > > > >> Ram > > > >> > > > >> ----- Original Message ----- > > > >> From: Ted Yu <[email protected]> > > > >> Date: Saturday, September 24, 2011 4:22 pm > > > >> Subject: maintaining stable HBase build > > > >> To: [email protected] > > > >> > > > >> > Hi, > > > >> > I want to bring the importance of maintaining stable HBase > > build to > > > >> > ourattention. > > > >> > A stable HBase build is important, not just for the next > > release> >> > but also > > > >> > for authors of the pending patches to verify the correctness of > > > >> > their work. > > > >> > > > > >> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK > > builds> >> > were all > > > >> > blue. Now they're all red. > > > >> > > > > >> > I don't mind fixing Jenkins build. But if we collectively adopt > > > >> > some good > > > >> > practice, it would be easier to achieve the goal of having > > stable> >> > builds. > > > >> > For contributors, I understand that it takes so much time to > > run> >> > whole test > > > >> > suite that he/she may not have the luxury of doing this - > > Apache> >> > Jenkinswouldn't do it when you press Submit Patch button. > > > >> > If this is the case (let's call it scenario A), please use > > Eclipse> >> > (or other > > > >> > tool) to identify tests that exercise the classes/methods in > > your> >> > patch and > > > >> > run them. Also clearly state what tests you ran in the JIRA. > > > >> > > > > >> > If you have a Linux box where you can run whole test suite, it > > > >> > would be nice > > > >> > to utilize such resource and run whole suite. Then please state > > > >> > this fact on > > > >> > the JIRA as well. > > > >> > Considering Todd's suggestion of holding off commit for 24 > > hours> >> > after code > > > >> > review, 2 hour test run isn't that long. > > > >> > > > > >> > Sometimes you may see the following (from 0.92 build 18): > > > >> > > > > >> > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21 > > > >> > > > > >> > [INFO] ------------------------------------------------------ > > ------- > > > >> > ----------- > > > >> > [INFO] BUILD FAILURE > > > >> > [INFO] ------------------------------------------------------ > > ------- > > > >> > ----------- > > > >> > [INFO] Total time: 1:51:41.797s > > > >> > > > > >> > You should examine the test summary above these lines and > > find out > > > >> > which test(s) hung. For this case it was TestMasterFailover: > > > >> > > > > >> > Running org.apache.hadoop.hbase.master.TestMasterFailover > > > >> > Running > > > >> > > > > > > org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTableTests> > >> run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec > > > >> > > > > >> > I think a script should be developed that parses test output > > and> >> > identify hanging test(s). > > > >> > > > > >> > For scenario A, I hope committer would run test suite. > > > >> > The net effect would be a statement on the JIRA, saying all > > tests> >> > passed. > > > >> > Your comments/suggestions are welcome. > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > -- > > // Jonathan Hsieh (shay) > > // Software Engineer, Cloudera > > // [email protected] > > >
