Re: maintaining stable HBase build

Andrew Purtell Sat, 24 Sep 2011 09:43:19 -0700

Thanks Ted.


> Since both Gary and Eugene have been working on HBASE-4014 for quite some
> time, I didn't initially question the test cases.

This is understandable but I think we should just not have this kind of trust. 
:-) I've been burned by committing something that I thought was fine due to the 
submitter before too. You can never know.


Best regards,


       - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
Tom White)


----- Original Message -----
> From: Ted Yu <[email protected]>
> To: [email protected]; Andrew Purtell <[email protected]>
> Cc: 
> Sent: Saturday, September 24, 2011 9:31 AM
> Subject: Re: maintaining stable HBase build
> 
>>>  It should never have gone in if only to be reverted 35 minutes later.
> (What happened?)
> 
> Since both Gary and Eugene have been working on HBASE-4014 for quite some
> time, I didn't initially question the test cases.
> After integrating the patch for TRUNK, I discovered that
> TestRegionServerCoprocessorExceptionWithAbort failed consistently on Mac and
> Linux. So I backed it out.
> I first thought of disabling this particular test but later abandoned that
> idea - if a core test fails, this means the feature may have issue.
> I notified Eugene immediately and he will take a look today.
> 
>>>  Scrolling down the commit history for trunk further, is a series of
> half-commits, addendums, reverts, reverts of reverts, etc.
> 
> If you were talking about
> HBASE-4132<https://issues.apache.org/jira/browse/HBASE-4132>,
> I initially tried to salvage the JIRA by adjusting the triggering assertion.
> However, that turned out to be not so trivial. So I reopened the JIRA.
> 
> Just FYI
> 
> On Sat, Sep 24, 2011 at 9:13 AM, Andrew Purtell <[email protected]> 
> wrote:
> 
>>  +1
>> 
>>  This:
>>  >>>
>>  > For contributors, I understand that it takes so much time to run whole
>>  test
>>  > suite that he/she may not have the luxury of doing this - Apache 
> Jenkins
>>  > wouldn't do it when you press Submit Patch button.
>>  > If this is the case (let's call it scenario A), please use Eclipse 
> (or
>>  other
>>  > tool) to identify tests that exercise the classes/methods in your 
> patch
>>  and
>>  > run them. Also clearly state what tests you ran in the JIRA.
>>  <<<
>> 
>>  and
>> 
>>  >>>
>>  > For scenario A, I hope committer would run test suite.
>> 
>>  <<<
>> 
>> 
>>  should be added to the How To Contribute page, IMHO.
>> 
>> 
>>  I see that HBASE-4014 went in -- which is important, so let's fix it 
> and
>>  try again -- and then went right out again, reverted after 35 minutes. It
>>  should never have gone in if only to be reverted 35 minutes later. (What
>>  happened?) Scrolling down the commit history for trunk further, is a series
>>  of half-commits, addendums, reverts, reverts of reverts, etc.
>> 
>>  It has recently become difficult to cherry pick any single commit from
>>  trunk andget all of the necessary parts of a change together or have any
>>  assurance the change is not toxic. This is not just a maintainer issue --
>>  diffing the full extent of a change to understand it fully mixes in
>>  unrelated changes between the initial commit and addendums, unless one
>>  resorts to octopus like contortions with git.
>> 
>> 
>>  So what is the solution? Submitted for your consideration:
>> 
>> 
>>  Committers should apply a candidate change and run the full test suite
>>  before committing the change to trunk or any branch. If applying a change 
> to
>>  a branch, a full test suite run of the branch code should complete
>>  successfully before commit there as well.
>> 
>>  No patch is so pressing that it cannot wait for tests to finish before
>>  commit, IMO.
>> 
>>  If a test fails, the patch does not go in.
>> 
>>  If a test fails repeatedly for unrelated reasons, the test comes out and a
>>  jira to fix it gets opened.
>> 
>>  Finally, I can see where people are trying to fix the build, so please
>>  exclude
>>  those commits from my complaint here, that is not part of the problem.
>>  Best regards,
>> 
>> 
>>         - Andy
>> 
>>  Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>  (via Tom White)
>> 
>> 
>>  ----- Original Message -----
>>  > From: Ted Yu <[email protected]>
>>  > To: [email protected]
>>  > Cc:
>>  > Sent: Saturday, September 24, 2011 3:51 AM
>>  > Subject: maintaining stable HBase build
>>  >
>>  > Hi,
>>  > I want to bring the importance of maintaining stable HBase build to 
> our
>>  > attention.
>>  > A stable HBase build is important, not just for the next release but 
> also
>>  > for authors of the pending patches to verify the correctness of their
>>  work.
>>  >
>>  > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds were 
> all
>>  > blue. Now they're all red.
>>  >
>>  > I don't mind fixing Jenkins build. But if we collectively adopt 
> some good
>>  > practice, it would be easier to achieve the goal of having stable 
> builds.
>>  >
>>  > For contributors, I understand that it takes so much time to run whole
>>  test
>>  > suite that he/she may not have the luxury of doing this - Apache 
> Jenkins
>>  > wouldn't do it when you press Submit Patch button.
>>  > If this is the case (let's call it scenario A), please use Eclipse 
> (or
>>  other
>>  > tool) to identify tests that exercise the classes/methods in your 
> patch
>>  and
>>  > run them. Also clearly state what tests you ran in the JIRA.
>>  >
>>  > If you have a Linux box where you can run whole test suite, it would 
> be
>>  nice
>>  > to utilize such resource and run whole suite. Then please state this 
> fact
>>  on
>>  > the JIRA as well.
>>  > Considering Todd's suggestion of holding off commit for 24 hours 
> after
>>  code
>>  > review, 2 hour test run isn't that long.
>>  >
>>  > Sometimes you may see the following (from 0.92 build 18):
>>  >
>>  > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21
>>  >
>>  > [INFO]
>>  ------------------------------------------------------------------------
>>  > [INFO] BUILD FAILURE
>>  > [INFO]
>>  ------------------------------------------------------------------------
>>  > [INFO] Total time: 1:51:41.797s
>>  >
>>  > You should examine the test summary above these lines and find out
>>  > which test(s) hung. For this case it was TestMasterFailover:
>>  >
>>  > Running org.apache.hadoop.hbase.master.TestMasterFailover
>>  > Running
>>  org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTable
>>  > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265
>>  sec
>>  >
>>  > I think a script should be developed that parses test output and
>>  > identify hanging test(s).
>>  >
>>  > For scenario A, I hope committer would run test suite.
>>  > The net effect would be a statement on the JIRA, saying all tests 
> passed.
>>  >
>>  > Your comments/suggestions are welcome.
>>  >
>> 
>

Re: maintaining stable HBase build

Reply via email to