[jira] [Commented] (HBASE-9703) DistributedHBaseCluster should not throw exceptions, but do a best effort restore

Hudson (JIRA) Fri, 04 Oct 2013 04:25:55 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-9703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786076#comment-13786076
 ]


Hudson commented on HBASE-9703:
-------------------------------

FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #777 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/777/])
HBASE-9703 DistributedHBaseCluster should not throw exceptions, but do a best 
effort restore (enis: rev 1529045)
* 
/hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/DistributedHBaseCluster.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseCluster.java


> DistributedHBaseCluster should not throw exceptions, but do a best effort 
> restore
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-9703
>                 URL: https://issues.apache.org/jira/browse/HBASE-9703
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 0.98.0, 0.96.0
>
>         Attachments: hbase-9703_v1.patch, hbase-9703_v3.patch
>
>
> At the end of integration tests, we are calling 
> DistributedCluster.restoreCluster() in case CM has killed nodes so that we 
> can leave the cluster in the same state that we have taken over. 
> However, if CM is not used in a test (for example ITLoadAndVerify), but some 
> regions servers die, or an external daemon kills the servers, we will still  
> try to restore at the end of the test which may or may not succeed (depending 
> on configuration, the region server going being unaccessible, etc. )
> We can do two things, either do a best effort restore cluster which will not 
> fail the test if there are any errors, or we can skip running restore if no 
> disruptive actions have taken place. 
> I am leaning towards the former one, since if an RS goes down with or w/o CM 
> due to bad disk etc., we cannot restore the cluster, but we should not fail 
> the test in this case. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9703) DistributedHBaseCluster should not throw exceptions, but do a best effort restore

Reply via email to