[
https://issues.apache.org/jira/browse/HBASE-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
nkeywal updated HBASE-5939:
---------------------------
Description:
When a binary dies on a server, we don't try to restart it while it would be
possible in most cases.
We can have something as:
loop
start
wait
if cleanStop then exit
if already stopped less than 5 minutes ago sleep 5 minute
endloop
This is simple for master & backup master, a little bit more complex for the
region server as it can be stopped by a script or by the shutdown procedure.
On a long long term it could allow a restart with exactly the same assignments.
was:
When a binary dies on a server, we don't try to restart it while it would be
possible in most cases.
We can have something as:
loop
start
wait
if cleanStop then exit
if already stopped less than 5 minutes ago sleep 1 minute
endloop
This is simple for master & backup master, a little bit more complex for the
region server as it can be stopped by a script or by the shutdown procedure.
On a long long term it could allow a restart with exactly the same assignments.
Release Note: When launched with autorestart, HBase processes will
automatically restart if they are not properly terminated, either by a "stop"
command or by a cluster stop. To ensure that it does not overload the system
when the server itself is corrupted and the process cannot be restarted, the
server sleeps for 5 minutes before restarting if it was already started 5
minutes ago previously. To use it, launch the process with "bin/start-hbase
autorestart". This option is not fully compatible with the existing "restart"
command: if you ask for a restart on a server launched with autorestart, the
server will restart but the next server instance won't be automatically
restarted.
> Add an autorestart option in the start scripts
> ----------------------------------------------
>
> Key: HBASE-5939
> URL: https://issues.apache.org/jira/browse/HBASE-5939
> Project: HBase
> Issue Type: Improvement
> Components: master, regionserver, scripts
> Affects Versions: 0.96.0
> Reporter: nkeywal
> Assignee: nkeywal
> Priority: Minor
> Fix For: 0.96.0
>
> Attachments: 5939.v4.patch
>
>
> When a binary dies on a server, we don't try to restart it while it would be
> possible in most cases.
> We can have something as:
> loop
> start
> wait
> if cleanStop then exit
> if already stopped less than 5 minutes ago sleep 5 minute
> endloop
> This is simple for master & backup master, a little bit more complex for the
> region server as it can be stopped by a script or by the shutdown procedure.
> On a long long term it could allow a restart with exactly the same
> assignments.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira