RE: maintaining stable HBase build

Ramkrishna S Vasudevan Mon, 26 Sep 2011 21:03:57 -0700

Hi Ted

Yes we need to investigate hanging tests seperately.
Regards
Ram


-----Original Message-----
From: Ted Yu [mailto:[email protected]] 
Sent: Tuesday, September 27, 2011 12:05 AM
To: [email protected]
Subject: Re: maintaining stable HBase build

>> we can kill the java processes that are hanging if any testcases hangs.
I think it is very important to find out why certain tests hang. Obtaining
jstack is the first step in terms of investigation.

Regards

On Mon, Sep 26, 2011 at 11:31 AM, Ramakrishna S Vasudevan 00902313 <
[email protected]> wrote:

> Hi
>
> Just wanted to share one thing that i learnt today in maven for running
> testcases.
>
> May be many will be knowing.
>
> We usually face problems like when we run testcases as a bunch few gets
> failed due to system problems or improper clean up of previous testcases.
>
> As Jon suggested we can seperate out flaky test cases from the correct
> ones.
>
> In maven we have a facility called profiles.
> We can add the testcases that we have seperated out seperately(may be in 2
> to 3 batches) and add it to seperate profiles.
>
> We can invoke these profiles like mvn test -P "profileid".
>
> We can right a script that executes every profile and inbetween executing
> every profile we can kill the java processes that are hanging if any
> testcases hangs.
> Just a suggestion. If you feel it suits you for some needs in any of your
> project work you can use it.
>
> Regards
> Ram
>
>
>
> ----- Original Message -----
> From: Jonathan Hsieh <[email protected]>
> Date: Monday, September 26, 2011 11:15 pm
> Subject: Re: maintaining stable HBase build
> To: [email protected], lars hofhansl <[email protected]>
>
> > I've been hunting some flaky tests down as well -- a few weeks back
> > I was
> > testing some changes along the line of HBASE-4326.  (maybe some of
> > these are
> > fixed?)
> >
> > First, two test seemed to flake fairly frequently and were likely
> > problemsinternal to the tests (TestReplication, TestMasterFailover).
> >
> > There is a second set of tests that after applying a draft of HBASE-
> > 4326,seems to moves to a different set of tests.  I'm pretty
> > convinced there are
> > some cross test problems with these. This was on an 0.90.4 based
> > branch, and
> > by now several more changes have gone in.  I'm getting back to
> > HBASE-4326
> > and will try to get more stats on this.
> >
> > Alternately, I exclude tests that I identify as flaky and exclude
> > them from
> > the test run and have a separate test run that only runs the flaky
> > tests. The hooks for the excludes build is in the hbase pom but
> > only works
> > with maven surefire 2.6 or 2.10 when it comes out.  (there is a bug in
> > surefire).  See this jira for more details.
> > http://jira.codehaus.org/browse/SUREFIRE-766
> >
> > Jon.
> >
> > On Sun, Sep 25, 2011 at 2:27 PM, lars hofhansl
> > <[email protected]> wrote:
> >
> > > At Salesforce we call these "flappers" and they are considered
> > almost worse
> > > than failing tests,
> > > as they add noise to a test run without adding confidence.
> > > At test that fails once in - say - 10 runs is worthless.
> > >
> > >
> > >
> > > ________________________________
> > > From: Ted Yu <[email protected]>
> > > To: [email protected]
> > > Sent: Sunday, September 25, 2011 1:41 PM
> > > Subject: Re: maintaining stable HBase build
> > >
> > > As of 1:38 PST Sunday, the three builds all passed.
> > >
> > > I think we have some tests that exhibit in-deterministic behavior.
> > >
> > > I suggest committers interleave patch submissions by 2 hour span
> > so that we
> > > can more easily identify patch(es) that break the build.
> > >
> > > Thanks
> > >
> > > On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <[email protected]> wrote:
> > >
> > > > I wrote a short blog:
> > > > http://zhihongyu.blogspot.com/2011/09/streamlining-patch-
> > submission.html> >
> > > > It is geared towards contributors.
> > > >
> > > > Cheers
> > > >
> > > >
> > > > On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan
> > 00902313 <
> > > > [email protected]> wrote:
> > > >
> > > >> Hi
> > > >>
> > > >> Ted, I agree with you.  Pasting the testcase results in JIRA
> > is also
> > > fine,
> > > >> mainly when there are some testcase failures when we run
> > locally but if
> > > we
> > > >> feel it is not due to the fix we have added we can mention
> > that also.  I
> > > >> think rather than in a windows machine its better to run in
> > linux box.
> > > >>
> > > >> +1 for your suggestion Ted.
> > > >>
> > > >> Can we add the feature like in HDFS when we submit patch
> > automatically> the
> > > >> Jenkin's run the testcases?
> > > >>
> > > >> Atleast till this is done I go with your suggestion.
> > > >>
> > > >> Regards
> > > >> Ram
> > > >>
> > > >> ----- Original Message -----
> > > >> From: Ted Yu <[email protected]>
> > > >> Date: Saturday, September 24, 2011 4:22 pm
> > > >> Subject: maintaining stable HBase build
> > > >> To: [email protected]
> > > >>
> > > >> > Hi,
> > > >> > I want to bring the importance of maintaining stable HBase
> > build to
> > > >> > ourattention.
> > > >> > A stable HBase build is important, not just for the next
> > release> >> > but also
> > > >> > for authors of the pending patches to verify the correctness of
> > > >> > their work.
> > > >> >
> > > >> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK
> > builds> >> > were all
> > > >> > blue. Now they're all red.
> > > >> >
> > > >> > I don't mind fixing Jenkins build. But if we collectively adopt
> > > >> > some good
> > > >> > practice, it would be easier to achieve the goal of having
> > stable> >> > builds.
> > > >> > For contributors, I understand that it takes so much time to
> > run> >> > whole test
> > > >> > suite that he/she may not have the luxury of doing this -
> > Apache> >> > Jenkinswouldn't do it when you press Submit Patch button.
> > > >> > If this is the case (let's call it scenario A), please use
> > Eclipse> >> > (or other
> > > >> > tool) to identify tests that exercise the classes/methods in
> > your> >> > patch and
> > > >> > run them. Also clearly state what tests you ran in the JIRA.
> > > >> >
> > > >> > If you have a Linux box where you can run whole test suite, it
> > > >> > would be nice
> > > >> > to utilize such resource and run whole suite. Then please state
> > > >> > this fact on
> > > >> > the JIRA as well.
> > > >> > Considering Todd's suggestion of holding off commit for 24
> > hours> >> > after code
> > > >> > review, 2 hour test run isn't that long.
> > > >> >
> > > >> > Sometimes you may see the following (from 0.92 build 18):
> > > >> >
> > > >> > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21
> > > >> >
> > > >> > [INFO] ------------------------------------------------------
> > -------
> > > >> > -----------
> > > >> > [INFO] BUILD FAILURE
> > > >> > [INFO] ------------------------------------------------------
> > -------
> > > >> > -----------
> > > >> > [INFO] Total time: 1:51:41.797s
> > > >> >
> > > >> > You should examine the test summary above these lines and
> > find out
> > > >> > which test(s) hung. For this case it was TestMasterFailover:
> > > >> >
> > > >> > Running org.apache.hadoop.hbase.master.TestMasterFailover
> > > >> > Running
> > > >> >
> > >
> >
org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTableTests>
> >> run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec
> > > >> >
> > > >> > I think a script should be developed that parses test output
> > and> >> > identify hanging test(s).
> > > >> >
> > > >> > For scenario A, I hope committer would run test suite.
> > > >> > The net effect would be a statement on the JIRA, saying all
> > tests> >> > passed.
> > > >> > Your comments/suggestions are welcome.
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > // Jonathan Hsieh (shay)
> > // Software Engineer, Cloudera
> > // [email protected]
> >
>

RE: maintaining stable HBase build

Reply via email to