[ https://issues.apache.org/jira/browse/HBASE-9703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786076#comment-13786076 ]
Hudson commented on HBASE-9703: ------------------------------- FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #777 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/777/]) HBASE-9703 DistributedHBaseCluster should not throw exceptions, but do a best effort restore (enis: rev 1529045) * /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/DistributedHBaseCluster.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseCluster.java > DistributedHBaseCluster should not throw exceptions, but do a best effort > restore > --------------------------------------------------------------------------------- > > Key: HBASE-9703 > URL: https://issues.apache.org/jira/browse/HBASE-9703 > Project: HBase > Issue Type: Improvement > Reporter: Enis Soztutar > Assignee: Enis Soztutar > Fix For: 0.98.0, 0.96.0 > > Attachments: hbase-9703_v1.patch, hbase-9703_v3.patch > > > At the end of integration tests, we are calling > DistributedCluster.restoreCluster() in case CM has killed nodes so that we > can leave the cluster in the same state that we have taken over. > However, if CM is not used in a test (for example ITLoadAndVerify), but some > regions servers die, or an external daemon kills the servers, we will still > try to restore at the end of the test which may or may not succeed (depending > on configuration, the region server going being unaccessible, etc. ) > We can do two things, either do a best effort restore cluster which will not > fail the test if there are any errors, or we can skip running restore if no > disruptive actions have taken place. > I am leaning towards the former one, since if an RS goes down with or w/o CM > due to bad disk etc., we cannot restore the cluster, but we should not fail > the test in this case. -- This message was sent by Atlassian JIRA (v6.1#6144)