[ 
https://issues.apache.org/jira/browse/HBASE-15924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15588054#comment-15588054
 ] 

Loknath Priyatham Teja Singamsetty  commented on HBASE-15924:
-------------------------------------------------------------

[~apurtell] 

{quote}
One option is to make a PID file of the supervisor, check if the supervisor PID 
file exists and is valid, if so then send a signal to the supervisor to 
terminate it, then terminate the child under watch.
{quote}

The autostart works by placing a file like this regionserver.autostart under 
HBASE_PID_DIR. As soon as stop is issued, it first removes this file so that 
autostart doesn't work anymore.

{quote}
In another test, I started the regionserver with ./bin/hbase-daemon.sh 
--autostart-window-retry-limit 3 autostart regionserver and in another SSH 
session then attempted to stop the regionserver with ./bin/hbase-daemon.sh stop 
regionserver. This appears to work, although I can see by tailing the 
regionserver log output file that the regionserver process is partially 
restarted and rapidly killed.
{quote}

This didn't occur to me. Please provide the repro steps for the same. Assuming 
that you are not using sfdc packages as in case of internal packages, we have a 
diff mechanism using cron which starts the process when killed. Kindly do 
re-check if you are testing on any of our internal clusters where cron based 
autorestart is already enabled.

Also note that added minor enhancement to wait for 20 sec after the 
hmaster/regionserver process is killed in ungraceful manner. This will help for 
any shutdown hook to be executed before the start command is triggered by 
autostart.

Attached new patch.

 

> Enhance hbase services autorestart capability to hbase-daemon.sh 
> -----------------------------------------------------------------
>
>                 Key: HBASE-15924
>                 URL: https://issues.apache.org/jira/browse/HBASE-15924
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.98.19
>            Reporter: Loknath Priyatham Teja Singamsetty 
>            Assignee: Loknath Priyatham Teja Singamsetty 
>             Fix For: 0.98.24
>
>         Attachments: HBASE-15924.master.0001.patch, 
> HBASE-15924.master.0002.patch, HBASE-15924.master.0003.patch
>
>
> As part of HBASE-5939, the autorestart for hbase services has been added to 
> deal with scenarios where hbase services (master/regionserver/master-backup) 
> gets killed or goes down leading to unplanned outages. The changes were made 
> to hbase-daemon.sh to support autorestart option. 
> However, the autorestart implementation doesn't work in standalone mode and 
> other than that have few gaps with the implementation as per release notes of 
> HBASE-5939. Here is an attempt to re-design and fix the functionality 
> considering all possible usecases with hbase service operations.
> Release Notes of HBASE-5939:
> ------------------------------------------
> When launched with autorestart, HBase processes will automatically restart if 
> they are not properly terminated, either by a "stop" command or by a cluster 
> stop. To ensure that it does not overload the system when the server itself 
> is corrupted and the process cannot be restarted, the server sleeps for 5 
> minutes before restarting if it was already started 5 minutes ago previously. 
> To use it, launch the process with "bin/start-hbase autorestart". This option 
> is not fully compatible with the existing "restart" command: if you ask for a 
> restart on a server launched with autorestart, the server will restart but 
> the next server instance won't be automatically restarted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to