[ 
https://issues.apache.org/jira/browse/HBASE-6290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641541#comment-13641541
 ] 

Benoit Sigoure commented on HBASE-6290:
---------------------------------------

Yes, that's right.  It would be great to have this sort of kill switch, both at 
the HBase level, as well as HDFS.  The feature I presented works especially 
well to tell all interested parties (clients) that the node they're trying to 
reach is dead, but often it doesn't help time out the node out of the cluster, 
e.g. in HDFS or MapReduce, the NameNode and JobTracker will ignore TCP resets 
and will not flag the node as really dead until some long pre-configured 
timeout elapses.
                
> Add a function a mark a server as dead and start the recovery the process
> -------------------------------------------------------------------------
>
>                 Key: HBASE-6290
>                 URL: https://issues.apache.org/jira/browse/HBASE-6290
>             Project: HBase
>          Issue Type: Improvement
>          Components: monitoring
>    Affects Versions: 0.95.2
>            Reporter: Nicolas Liochon
>            Assignee: Nicolas Liochon
>            Priority: Minor
>              Labels: noob
>
> ZooKeeper is used a a monitoring tool: we use znode and we start the recovery 
> process when a znode is deleted by ZK because it got a timeout. This timeout 
> is defaulted to 90 seconds, and often set to 30s
> However, some HW issues could be detected by specialized hw monitoring tools 
> before the ZK timeout. For this reason, it makes sense to offer a very simple 
> function to mark a RS as dead. This should not take in
> It could be a hbase shell function such as
> considerAsDead ipAddress|serverName
> This would delete all the znodes of the server running on this box, starting 
> the recovery process.
> Such a function would be easily callable (at callers risk) by any fault 
> detection tool... We could have issues to identify the right master & region 
> servers around ipv4 vs ipv6 vs and multi networked boxes however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to