[
https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869629#comment-13869629
]
Nicolas Liochon commented on HBASE-7386:
----------------------------------------
Thanks a lot for the fix of HBASE-10310, Samir. I went through your patch.
It's a difficult read when you don't know supervisor ;-). The definition of
'PROCESS_STATE_UNKNOWN' is a little scary (as we kill the region server when we
reach this state).
There are some typos ('Test is supevisored installed' instead of supevisord).
I'm not sure about stuff like 'subprocess.call('/bin/mail -s
"HBASE_PROCESS_EVENT" %s < %s'%(email, tmp_file), shell=True)': seems machine
dependent, there is no /bin/mail on my ubuntu desktop.
Do we have to use python?
It would be good to have a review from someone who knows supervisor... As well,
this should be documented in the hbase reference guide imho.
> Investigate providing some supervisor support for znode deletion
> ----------------------------------------------------------------
>
> Key: HBASE-7386
> URL: https://issues.apache.org/jira/browse/HBASE-7386
> Project: HBase
> Issue Type: Task
> Components: master, regionserver, scripts
> Reporter: Gregory Chanan
> Assignee: stack
> Priority: Blocker
> Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch,
> HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch,
> HBASE-7386-v0.patch, supervisordconfigs-v0.patch
>
>
> There a couple of JIRAs for deleting the znode on a process failure:
> HBASE-5844 (RS)
> HBASE-5926 (Master)
> which are pretty neat; on process failure, they delete the znode of the
> underlying process so HBase can recover faster.
> These JIRAs were implemented via the startup scripts; i.e. the script hangs
> around and waits for the process to exit, then deletes the znode.
> There are a few problems associated with this approach, as listed in the
> below JIRAs:
> 1) Hides startup output in script
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
> 2) two hbase processes listed per launched daemon
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
> 3) Not run by a real supervisor
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
> 4) Weird output after kill -9 actual process in standalone mode
> https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801
> 5) Can kill existing RS if called again
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
> 6) Hides stdout/stderr[6]
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832
> I suspect running in via something like supervisor.d can solve these issues
> if we provide the right support.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)