[
https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13857994#comment-13857994
]
Samir Ahmic commented on HBASE-7386:
------------------------------------
Thanks for review add comments [~stack]]
In respect of usage and documentation this scripts are following same logic as
scripts in bin directory. For example here is output of
start-supervisord-hbase.sh command:
{code}
$ $HBASE_HOME/bin/supervisord/start-supervisord-hbase.sh
localhost: hbase-ZK: started
hbase-MASTER: started
localhost: hbase-RS: started
{code}
so considering usage this relations are true:
start-hbase.sh ~= start-supervisord-hbase.sh
stop-hbase.sh ~= stop-supervisord-hbase.sh
hbase-daemon.sh ~= hbase-supervisord.sh
I agree there is danger that scripts 'rot' but also i believe that this
approach can solve number of issues for ops people and generally improve hbase
MTTR . What is your suggestion how to address 'rot' scripts issue ?
graceful-stop.sh from bin dir can be modified to avoid copy/paste. I will also
check rest of scripts to try to reduce amount of copy/paste.
migrate_to_supervisord.sh will switch running cluster that was started with
scripts from bin directory to use supervisor. It will stop hbase daemons on
nodes using hbase-daemon.sh and then it will start then using
hbase-supervisord.sh script (revert_to_scripts.sh will do opposite).
For master znode is removed by autostart method (patch in
HMasterCommandLine.java ) in moment of starting. We have supervisor config
autorestart=true so if master process dies unexpectedly supervisor will kick
off autorestart and in that moment znode will be removed giving enough time for
backup master to become active. Alternative is to craft listener script
similar to mail_notification.py that will remove master znode when detects that
process is exiting.
Regarding RS znodes scripts does not remove them yet. I was thinking about
listener script (similar to mail_notification.py) calling 'hbase zkcli rmr
RSznode' or we can modify HRegionServerCommadLine.java and add 'autorestart'
like in HMasterCommandLine.java. What is your suggestion how to address this ?
Basically all this scripts are wrappers around 'supervisord' and
'supervisorctl' commands which are python based,
I hope i have clarify some details.
Cheers
> Investigate providing some supervisor support for znode deletion
> ----------------------------------------------------------------
>
> Key: HBASE-7386
> URL: https://issues.apache.org/jira/browse/HBASE-7386
> Project: HBase
> Issue Type: Task
> Components: master, regionserver, scripts
> Reporter: Gregory Chanan
> Assignee: stack
> Priority: Blocker
> Attachments: HBASE-7386-bin.patch, HBASE-7386-conf.patch,
> HBASE-7386-src.patch, HBASE-7386-v0.patch, supervisordconfigs-v0.patch
>
>
> There a couple of JIRAs for deleting the znode on a process failure:
> HBASE-5844 (RS)
> HBASE-5926 (Master)
> which are pretty neat; on process failure, they delete the znode of the
> underlying process so HBase can recover faster.
> These JIRAs were implemented via the startup scripts; i.e. the script hangs
> around and waits for the process to exit, then deletes the znode.
> There are a few problems associated with this approach, as listed in the
> below JIRAs:
> 1) Hides startup output in script
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
> 2) two hbase processes listed per launched daemon
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
> 3) Not run by a real supervisor
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
> 4) Weird output after kill -9 actual process in standalone mode
> https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801
> 5) Can kill existing RS if called again
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
> 6) Hides stdout/stderr[6]
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832
> I suspect running in via something like supervisor.d can solve these issues
> if we provide the right support.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)