[
https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16096776#comment-16096776
]
Samir Ahmic commented on HBASE-7386:
------------------------------------
Thanks [~stack]. Why python supervisor? Well we originally started this story
around it, and after some time testing it, at least for me, choosing mature
and well proven process control system instead of writing custom bash scripts
has multiple advantages.
To be honest work here extends original issue of just removing stale znodes to
creating watchdog over hbase processes and making alternative option for
managing cluster but when we started tackling supervisor approach why not offer
folks chance to
less worry when rs process dies because it will be automatically restarted :)
Also python supervisor has set of very cool futures like, auto-restart, event
listeners (that may execute arbitrary code based on process state) an so on,
and folks may start creating they own listeners for different proposes.
Btw i will address shellcheck and pylint issues in next patch.
> Investigate providing some supervisor support for znode deletion
> ----------------------------------------------------------------
>
> Key: HBASE-7386
> URL: https://issues.apache.org/jira/browse/HBASE-7386
> Project: HBase
> Issue Type: Task
> Components: master, regionserver, scripts
> Reporter: Gregory Chanan
> Assignee: stack
> Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HBASE-7386-bin.patch, HBASE-7386-bin-v2.patch,
> HBASE-7386-bin-v3.patch, HBASE-7386-conf.patch, HBASE-7386-conf-v2.patch,
> HBASE-7386-conf-v3.patch, HBASE-7386-master-00.patch, HBASE-7386-src.patch,
> HBASE-7386-v0.patch, supervisordconfigs-v0.patch
>
>
> There a couple of JIRAs for deleting the znode on a process failure:
> HBASE-5844 (RS)
> HBASE-5926 (Master)
> which are pretty neat; on process failure, they delete the znode of the
> underlying process so HBase can recover faster.
> These JIRAs were implemented via the startup scripts; i.e. the script hangs
> around and waits for the process to exit, then deletes the znode.
> There are a few problems associated with this approach, as listed in the
> below JIRAs:
> 1) Hides startup output in script
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
> 2) two hbase processes listed per launched daemon
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
> 3) Not run by a real supervisor
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
> 4) Weird output after kill -9 actual process in standalone mode
> https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801
> 5) Can kill existing RS if called again
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
> 6) Hides stdout/stderr[6]
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832
> I suspect running in via something like supervisor.d can solve these issues
> if we provide the right support.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)