[
https://issues.apache.org/jira/browse/HBASE-7299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13542726#comment-13542726
]
chunhui shen commented on HBASE-7299:
-------------------------------------
I have analysed the logs of trunk build #3686, and found the reason.
1.We will abort the regionserver 0 in both testBatchWithPut and
testFlushCommitsWithAbort
2.We will ensure 2 regionservers alisve before each test
{code}
@Before public void before() throws IOException {
LOG.info("before");
if (UTIL.ensureSomeRegionServersAvailable(slaves)) {
// Distribute regions
UTIL.getMiniHBaseCluster().getMaster().balance();
}
LOG.info("before done");
}
{code}
3.In trunk build #3686, testFlushCommitsWithAbort is run after
testBatchWithPut
{code}
2013-01-02 12:28:33,183 INFO [pool-1-thread-1] hbase.ResourceChecker(147):
before: client.TestMultiParallel#testBatchWithPut
...
2013-01-02 12:30:08,410 INFO [pool-1-thread-1] hbase.ResourceChecker(147):
before: client.TestMultiParallel#testFlushCommitsWithAbort
{code}
4.testFlushCommitsWithAbort abort the regionserver 0 which is already aborted
by testBatchWithPut, so we see the following log:
{code}
2013-01-02 12:30:08,410 INFO [pool-1-thread-1] hbase.ResourceChecker(147):
before: client.TestMultiParallel#testFlushCommitsWithAbort
2013-01-02 12:30:08,410 INFO [pool-1-thread-1] client.TestMultiParallel(77):
before
2013-01-02 12:30:08,410 INFO [pool-1-thread-1] hbase.LocalHBaseCluster(243):
Not alive RegionServer:0;juno.apache.org,40265,1357129678691
2013-01-02 12:30:08,410 INFO [pool-1-thread-1] client.TestMultiParallel(82):
before done
2013-01-02 12:30:08,410 INFO [Thread-709] client.TestMultiParallel(226):
test=testFlushCommitsWithAbort
...
2013-01-02 12:30:09,059 INFO [Thread-709] hbase.LocalHBaseCluster(243): Not
alive RegionServer:0;juno.apache.org,40265,1357129678691
2013-01-02 12:30:09,059 INFO [Thread-709] client.TestMultiParallel(277):
Count=1, Alive=juno.apache.org,40198,1357129678744
2013-01-02 12:30:09,059 INFO [Thread-709] client.TestMultiParallel(277):
Count=2, Alive=juno.apache.org,51431,1357129753348
{code}
5.From the above, it's clear there are total 3 regionservers and 2 are alive,
but testFlushCommitsWithAbort consider only total 2 regionserver
Uploading the addendum2 to fix the case bug
> TestMultiParallel fails intermittently in trunk builds
> ------------------------------------------------------
>
> Key: HBASE-7299
> URL: https://issues.apache.org/jira/browse/HBASE-7299
> Project: HBase
> Issue Type: Bug
> Reporter: Ted Yu
> Assignee: chunhui shen
> Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 7299.addendum, 7299-v4.txt, HBASE-7299.patch,
> HBASE-7299v2.patch, HBASE-7299v3.patch
>
>
> From trunk build #3598:
> {code}
> testFlushCommitsNoAbort(org.apache.hadoop.hbase.client.TestMultiParallel):
> Count of regions=8
> {code}
> It failed in 3595 as well:
> {code}
> java.lang.AssertionError: Server count=2, abort=true expected:<1> but was:<2>
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.failNotEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:128)
> at org.junit.Assert.assertEquals(Assert.java:472)
> at
> org.apache.hadoop.hbase.client.TestMultiParallel.doTestFlushCommits(TestMultiParallel.java:267)
> at
> org.apache.hadoop.hbase.client.TestMultiParallel.testFlushCommitsWithAbort(TestMultiParallel.java:226)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira