[
https://issues.apache.org/jira/browse/HBASE-21126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603741#comment-16603741
]
David Manning commented on HBASE-21126:
---------------------------------------
Thanks Josh, very much appreciative of your work, and assistance in updating
all the branch numbers (I didn't know exactly which to select, so I'm very glad
to have your expertise.)
> Add ability for HBase Canary to ignore a configurable number of ZooKeeper
> down nodes
> ------------------------------------------------------------------------------------
>
> Key: HBASE-21126
> URL: https://issues.apache.org/jira/browse/HBASE-21126
> Project: HBase
> Issue Type: Improvement
> Components: canary, Zookeeper
> Affects Versions: 1.0.0, 3.0.0, 2.0.0
> Reporter: David Manning
> Assignee: David Manning
> Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
> Attachments: HBASE-21126.branch-1.001.patch,
> HBASE-21126.master.001.patch, HBASE-21126.master.002.patch,
> HBASE-21126.master.003.patch, zookeeperCanaryLocalTestValidation.txt
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> When running org.apache.hadoop.hbase.tool.Canary with args -zookeeper
> -treatFailureAsError, the Canary will try to get a znode from each ZooKeeper
> server in the ensemble. If any server is unavailable or unresponsive, the
> canary will exit with a failure code.
> If we use the Canary to gauge server health, and alert accordingly, this can
> be too strict. For example, in a 5-node ZooKeeper cluster, having one node
> down is safe and expected in rolling upgrades/patches.
> This is a request to allow the Canary to take another parameter
> {code:java}
> -permittedZookeeperFailures <N>{code}
> If N=1, in the 5-node ZooKeeper ensemble example, then the Canary will still
> pass if 4 ZooKeeper nodes are reachable, but fail if 3 or fewer are reachable.
> (This is my first Jira posting... sorry if I messed anything up.)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)