[
https://issues.apache.org/jira/browse/HBASE-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640073#action_12640073
]
Andrew Purtell commented on HBASE-932:
--------------------------------------
Our service monitoring and recovery framework detects regionserver shutdowns
and restarts them. Seems to work pretty well if the fatal fault was due to e.g.
a transient DFS problem, related to loading maybe. Suggest there should be a
fixed restart limit and some backoff if a restart is not successful.
> Regionserver restart
> --------------------
>
> Key: HBASE-932
> URL: https://issues.apache.org/jira/browse/HBASE-932
> Project: Hadoop HBase
> Issue Type: Improvement
> Reporter: stack
>
> If we drop a flush or we fail close a write-ahead log, we currently shutdown
> the regionserver (we fail because of hdfs usually). Rather than shut
> themselves down, how about they restart? The restart at least in the
> HBASE-930 might fix the issue shaking DFSClient so it gets sense again. Even
> is HDFS is bad, it'll come around eventually. The HRS restarting itself plus
> HBASE-926 fix will make for fast recovery.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.