[ 
https://issues.apache.org/jira/browse/HBASE-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640073#action_12640073
 ] 

Andrew Purtell commented on HBASE-932:
--------------------------------------

Our service monitoring and recovery framework detects regionserver shutdowns 
and restarts them. Seems to work pretty well if the fatal fault was due to e.g. 
a transient DFS problem, related to loading maybe. Suggest there should be a 
fixed restart limit and some backoff if a restart is not successful. 

> Regionserver restart
> --------------------
>
>                 Key: HBASE-932
>                 URL: https://issues.apache.org/jira/browse/HBASE-932
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: stack
>
> If we drop a flush or we fail close a write-ahead log, we currently shutdown 
> the regionserver (we fail because of hdfs usually).  Rather than shut 
> themselves down, how about they restart?  The restart at least in the 
> HBASE-930 might fix the issue shaking DFSClient so it gets sense again.  Even 
> is HDFS is bad, it'll come around eventually.  The HRS restarting itself plus 
> HBASE-926 fix will make for fast recovery.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to