[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion

Samir Ahmic (JIRA) Fri, 21 Jul 2017 13:07:28 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16096776#comment-16096776
 ]


Samir Ahmic commented on HBASE-7386:
------------------------------------

Thanks [~stack].  Why python supervisor? Well we originally started this story 
around it, and after some time testing it, at least for me,  choosing mature 
and well proven process control system instead of writing custom bash scripts 
has multiple advantages. 
To be honest work here extends original issue of just removing stale znodes to 
creating watchdog over hbase processes and making alternative option for 
managing cluster but when we started tackling supervisor approach why not offer 
folks chance to 
less worry when rs process dies because it will be automatically restarted :) 
Also python supervisor has set of very cool futures like, auto-restart, event 
listeners (that may execute arbitrary code based on process state) an so on, 
and folks may start creating  they own  listeners for different proposes.
Btw i will address shellcheck and pylint issues in next patch. 

> Investigate providing some supervisor support for znode deletion
> ----------------------------------------------------------------
>
>                 Key: HBASE-7386
>                 URL: https://issues.apache.org/jira/browse/HBASE-7386
>             Project: HBase
>          Issue Type: Task
>          Components: master, regionserver, scripts
>            Reporter: Gregory Chanan
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 3.0.0
>
>         Attachments: HBASE-7386-bin.patch, HBASE-7386-bin-v2.patch, 
> HBASE-7386-bin-v3.patch, HBASE-7386-conf.patch, HBASE-7386-conf-v2.patch, 
> HBASE-7386-conf-v3.patch, HBASE-7386-master-00.patch, HBASE-7386-src.patch, 
> HBASE-7386-v0.patch, supervisordconfigs-v0.patch
>
>
> There a couple of JIRAs for deleting the znode on a process failure:
> HBASE-5844 (RS)
> HBASE-5926 (Master)
> which are pretty neat; on process failure, they delete the znode of the 
> underlying process so HBase can recover faster.
> These JIRAs were implemented via the startup scripts; i.e. the script hangs 
> around and waits for the process to exit, then deletes the znode.
> There are a few problems associated with this approach, as listed in the 
> below JIRAs:
> 1) Hides startup output in script
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
> 2) two hbase processes listed per launched daemon
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
> 3) Not run by a real supervisor
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
> 4) Weird output after kill -9 actual process in standalone mode
> https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801
> 5) Can kill existing RS if called again
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
> 6) Hides stdout/stderr[6]
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832
> I suspect running in via something like supervisor.d can solve these issues 
> if we provide the right support.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion

Reply via email to