[
https://issues.apache.org/jira/browse/HBASE-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345078#comment-14345078
]
zhangduo commented on HBASE-13145:
----------------------------------
Find this in
https://builds.apache.org/job/PreCommit-HBASE-Build/13059//testReport/org.apache.hadoop.hbase.client/TestCloneSnapshotFromClient/org_apache_hadoop_hbase_client_TestCloneSnapshotFromClient/
{noformat}
2015-03-03 12:00:29,530 INFO [M:0;asf900:49662]
regionserver.HRegionServer(1795): STOPPED: One or more threads are no longer
alive -- stop
{noformat}
Seems we failed in isHealthy check and stop HMaster when starting.
{code:title=HRegionServer.java}
// Verify that all threads are alive
if (!(leases.isAlive()
&& cacheFlusher.isAlive() && walRoller.isAlive()
&& this.compactionChecker.isScheduled()
&& this.periodicFlusher.isScheduled())) {
stop("One or more threads are no longer alive -- stop");
return false;
}
{code}
> TestNamespaceAuditor.testRegionMerge is flaky
> ---------------------------------------------
>
> Key: HBASE-13145
> URL: https://issues.apache.org/jira/browse/HBASE-13145
> Project: HBase
> Issue Type: Bug
> Components: test
> Affects Versions: 2.0.0, 1.1.0
> Reporter: zhangduo
> Assignee: zhangduo
> Attachments: HBASE-13145.patch
>
>
> Dig into the log
> https://builds.apache.org/job/HBase-TRUNK/6197/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.namespace.TestNamespaceAuditor-output.txt
> Seems a split operation which we expect to success is started before we
> finishing a merge and cause an infinite sleep loop.
> I guess the problem is here
> {code:title=TestNamespaceAuditor.java}
> // merge the two regions
> admin.mergeRegions(hris.get(0).getEncodedNameAsBytes(),
> hris.get(1).getEncodedNameAsBytes(), false);
>
> while (admin.getTableRegions(tableTwo).size() == initialRegions) {
> Thread.sleep(100);
> }
> {code}
> I guess that during a merge, we can get more region count than before because
> we first online the new region and then offline the two old regions.
> So change it to admin.getTableRegions(tableTwo).size() != initialRegions - 1
> may work.
> And we can modify the while loop to use Waiter.waitFor which can provide more
> useful information when test failed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)