[
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216863#comment-13216863
]
stack commented on HBASE-5075:
------------------------------
@zhiyuan.dai What you think of the idea of using supervisor or any of the other
babysitting programs instead of writing our own from new? If you need to have
hbase regionservers dump out their servername so you know what to kill up in
zk, that can be done easy enough....
> regionserver crashed and failover
> ---------------------------------
>
> Key: HBASE-5075
> URL: https://issues.apache.org/jira/browse/HBASE-5075
> Project: HBase
> Issue Type: Improvement
> Components: monitoring, regionserver, replication, zookeeper
> Affects Versions: 0.92.1
> Reporter: zhiyuan.dai
> Fix For: 0.90.5
>
> Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch,
> HBase-5075-src.patch
>
>
> regionserver crashed,it is too long time to notify hmaster.when hmaster know
> regionserver's shutdown,it is long time to fetch the hlog's lease.
> hbase is a online db, availability is very important.
> i have a idea to improve availability, monitor node to check regionserver's
> pid.if this pid not exsits,i think the rs down,i will delete the znode,and
> force close the hlog file.
> so the period maybe 100ms.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira