[jira] [Commented] (HBASE-5075) regionserver crashed and failover

zhiyuan.dai (Commented) (JIRA) Sun, 19 Feb 2012 22:40:04 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211691#comment-13211691
 ]


zhiyuan.dai commented on HBASE-5075:
------------------------------------

@stack @Lars Hofhansl
First the rpc method getRSPidAndRsZknode is to fetch PID and znode which 
includes domain and service port,this way is reliable. If we use processes 
list, there may be some misjudgment.

Second there is a supervisor called RegionServerFailureDetection,we first start 
regionserver, and then start 
RegionServerFailureDetection.RegionServerFailureDetection is a watchdog of 
RegionServer.

Then the supervisor(RegionServerFailureDetection) of regionserver fetch PID and 
znode by getRSPidAndRsZknode.

RegionServerFailureDetection doesn't have any relationship with long GC.

RegionServerFailureDetection first check whether PID is alive and the check 
service port is alive.
                
> regionserver crashed and failover
> ---------------------------------
>
>                 Key: HBASE-5075
>                 URL: https://issues.apache.org/jira/browse/HBASE-5075
>             Project: HBase
>          Issue Type: Improvement
>          Components: monitoring, regionserver, replication, zookeeper
>    Affects Versions: 0.92.1
>            Reporter: zhiyuan.dai
>             Fix For: 0.90.5
>
>         Attachments: Degion of Failure Detection.pdf, HBase-5075-src.patch
>
>
> regionserver crashed,it is too long time to notify hmaster.when hmaster know 
> regionserver's shutdown,it is long time to fetch the hlog's lease.
> hbase is a online db, availability is very important.
> i have a idea to improve availability, monitor node to check regionserver's 
> pid.if this pid not exsits,i think the rs down,i will delete the znode,and 
> force close the hlog file.
> so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5075) regionserver crashed and failover

Reply via email to