Thanks Ted.
> Since both Gary and Eugene have been working on HBASE-4014 for quite some > time, I didn't initially question the test cases. This is understandable but I think we should just not have this kind of trust. :-) I've been burned by committing something that I thought was fine due to the submitter before too. You can never know. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) ----- Original Message ----- > From: Ted Yu <[email protected]> > To: [email protected]; Andrew Purtell <[email protected]> > Cc: > Sent: Saturday, September 24, 2011 9:31 AM > Subject: Re: maintaining stable HBase build > >>> It should never have gone in if only to be reverted 35 minutes later. > (What happened?) > > Since both Gary and Eugene have been working on HBASE-4014 for quite some > time, I didn't initially question the test cases. > After integrating the patch for TRUNK, I discovered that > TestRegionServerCoprocessorExceptionWithAbort failed consistently on Mac and > Linux. So I backed it out. > I first thought of disabling this particular test but later abandoned that > idea - if a core test fails, this means the feature may have issue. > I notified Eugene immediately and he will take a look today. > >>> Scrolling down the commit history for trunk further, is a series of > half-commits, addendums, reverts, reverts of reverts, etc. > > If you were talking about > HBASE-4132<https://issues.apache.org/jira/browse/HBASE-4132>, > I initially tried to salvage the JIRA by adjusting the triggering assertion. > However, that turned out to be not so trivial. So I reopened the JIRA. > > Just FYI > > On Sat, Sep 24, 2011 at 9:13 AM, Andrew Purtell <[email protected]> > wrote: > >> +1 >> >> This: >> >>> >> > For contributors, I understand that it takes so much time to run whole >> test >> > suite that he/she may not have the luxury of doing this - Apache > Jenkins >> > wouldn't do it when you press Submit Patch button. >> > If this is the case (let's call it scenario A), please use Eclipse > (or >> other >> > tool) to identify tests that exercise the classes/methods in your > patch >> and >> > run them. Also clearly state what tests you ran in the JIRA. >> <<< >> >> and >> >> >>> >> > For scenario A, I hope committer would run test suite. >> >> <<< >> >> >> should be added to the How To Contribute page, IMHO. >> >> >> I see that HBASE-4014 went in -- which is important, so let's fix it > and >> try again -- and then went right out again, reverted after 35 minutes. It >> should never have gone in if only to be reverted 35 minutes later. (What >> happened?) Scrolling down the commit history for trunk further, is a series >> of half-commits, addendums, reverts, reverts of reverts, etc. >> >> It has recently become difficult to cherry pick any single commit from >> trunk andget all of the necessary parts of a change together or have any >> assurance the change is not toxic. This is not just a maintainer issue -- >> diffing the full extent of a change to understand it fully mixes in >> unrelated changes between the initial commit and addendums, unless one >> resorts to octopus like contortions with git. >> >> >> So what is the solution? Submitted for your consideration: >> >> >> Committers should apply a candidate change and run the full test suite >> before committing the change to trunk or any branch. If applying a change > to >> a branch, a full test suite run of the branch code should complete >> successfully before commit there as well. >> >> No patch is so pressing that it cannot wait for tests to finish before >> commit, IMO. >> >> If a test fails, the patch does not go in. >> >> If a test fails repeatedly for unrelated reasons, the test comes out and a >> jira to fix it gets opened. >> >> Finally, I can see where people are trying to fix the build, so please >> exclude >> those commits from my complaint here, that is not part of the problem. >> Best regards, >> >> >> - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White) >> >> >> ----- Original Message ----- >> > From: Ted Yu <[email protected]> >> > To: [email protected] >> > Cc: >> > Sent: Saturday, September 24, 2011 3:51 AM >> > Subject: maintaining stable HBase build >> > >> > Hi, >> > I want to bring the importance of maintaining stable HBase build to > our >> > attention. >> > A stable HBase build is important, not just for the next release but > also >> > for authors of the pending patches to verify the correctness of their >> work. >> > >> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds were > all >> > blue. Now they're all red. >> > >> > I don't mind fixing Jenkins build. But if we collectively adopt > some good >> > practice, it would be easier to achieve the goal of having stable > builds. >> > >> > For contributors, I understand that it takes so much time to run whole >> test >> > suite that he/she may not have the luxury of doing this - Apache > Jenkins >> > wouldn't do it when you press Submit Patch button. >> > If this is the case (let's call it scenario A), please use Eclipse > (or >> other >> > tool) to identify tests that exercise the classes/methods in your > patch >> and >> > run them. Also clearly state what tests you ran in the JIRA. >> > >> > If you have a Linux box where you can run whole test suite, it would > be >> nice >> > to utilize such resource and run whole suite. Then please state this > fact >> on >> > the JIRA as well. >> > Considering Todd's suggestion of holding off commit for 24 hours > after >> code >> > review, 2 hour test run isn't that long. >> > >> > Sometimes you may see the following (from 0.92 build 18): >> > >> > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21 >> > >> > [INFO] >> ------------------------------------------------------------------------ >> > [INFO] BUILD FAILURE >> > [INFO] >> ------------------------------------------------------------------------ >> > [INFO] Total time: 1:51:41.797s >> > >> > You should examine the test summary above these lines and find out >> > which test(s) hung. For this case it was TestMasterFailover: >> > >> > Running org.apache.hadoop.hbase.master.TestMasterFailover >> > Running >> org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTable >> > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 >> sec >> > >> > I think a script should be developed that parses test output and >> > identify hanging test(s). >> > >> > For scenario A, I hope committer would run test suite. >> > The net effect would be a statement on the JIRA, saying all tests > passed. >> > >> > Your comments/suggestions are welcome. >> > >> >
