[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion

Samir Ahmic (JIRA) Sat, 28 Dec 2013 02:27:46 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13857994#comment-13857994
 ]


Samir Ahmic commented on HBASE-7386:
------------------------------------

Thanks for review add comments [~stack]]
In respect of usage and documentation this scripts are following same logic as 
scripts in bin directory. For example here is output of 
start-supervisord-hbase.sh command:
{code}
$ $HBASE_HOME/bin/supervisord/start-supervisord-hbase.sh
localhost: hbase-ZK: started
hbase-MASTER: started
localhost: hbase-RS: started
{code}
so considering usage this relations are true:
start-hbase.sh ~= start-supervisord-hbase.sh
stop-hbase.sh ~= stop-supervisord-hbase.sh
hbase-daemon.sh ~= hbase-supervisord.sh

I agree there is danger that scripts 'rot'  but also i believe that this 
approach can solve number of issues for ops people and generally improve hbase 
MTTR .  What is your suggestion how to address 'rot' scripts issue ?

graceful-stop.sh from bin dir can be modified to avoid copy/paste. I will also 
check rest of scripts to try to reduce amount of copy/paste.

migrate_to_supervisord.sh will switch running cluster that was started with 
scripts from bin directory  to use supervisor. It will stop hbase daemons on 
nodes using hbase-daemon.sh and then it will start then using 
hbase-supervisord.sh script (revert_to_scripts.sh will do opposite).

For master znode is removed by autostart method (patch in 
HMasterCommandLine.java ) in moment of starting. We have supervisor config 
autorestart=true so if master process dies unexpectedly supervisor will kick 
off autorestart and in that moment znode will be removed giving enough time for 
backup master to become active.  Alternative is to craft listener script 
similar to mail_notification.py that will remove master znode when detects that 
process is exiting.

Regarding RS znodes scripts does not  remove them yet. I was thinking about 
listener script (similar to  mail_notification.py) calling 'hbase zkcli rmr  
RSznode'  or we can modify HRegionServerCommadLine.java and add 'autorestart' 
like in  HMasterCommandLine.java. What is your suggestion how to address this ?

Basically all this scripts are wrappers around 'supervisord'  and 
'supervisorctl' commands which are python based, 
I hope i have clarify some details. 

Cheers





> Investigate providing some supervisor support for znode deletion
> ----------------------------------------------------------------
>
>                 Key: HBASE-7386
>                 URL: https://issues.apache.org/jira/browse/HBASE-7386
>             Project: HBase
>          Issue Type: Task
>          Components: master, regionserver, scripts
>            Reporter: Gregory Chanan
>            Assignee: stack
>            Priority: Blocker
>         Attachments: HBASE-7386-bin.patch, HBASE-7386-conf.patch, 
> HBASE-7386-src.patch, HBASE-7386-v0.patch, supervisordconfigs-v0.patch
>
>
> There a couple of JIRAs for deleting the znode on a process failure:
> HBASE-5844 (RS)
> HBASE-5926 (Master)
> which are pretty neat; on process failure, they delete the znode of the 
> underlying process so HBase can recover faster.
> These JIRAs were implemented via the startup scripts; i.e. the script hangs 
> around and waits for the process to exit, then deletes the znode.
> There are a few problems associated with this approach, as listed in the 
> below JIRAs:
> 1) Hides startup output in script
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
> 2) two hbase processes listed per launched daemon
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
> 3) Not run by a real supervisor
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
> 4) Weird output after kill -9 actual process in standalone mode
> https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801
> 5) Can kill existing RS if called again
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
> 6) Hides stdout/stderr[6]
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832
> I suspect running in via something like supervisor.d can solve these issues 
> if we provide the right support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion

Reply via email to