[
https://issues.apache.org/jira/browse/HBASE-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Kyle Purtell resolved HBASE-23206.
-----------------------------------------
Resolution: Won't Fix
> ZK quorum redundancy with failover in RZK
> -----------------------------------------
>
> Key: HBASE-23206
> URL: https://issues.apache.org/jira/browse/HBASE-23206
> Project: HBase
> Issue Type: Brainstorming
> Reporter: Andrew Kyle Purtell
> Priority: Major
>
> We have faced a few production issues where the reliability of the ZooKeeper
> quorum serving the cluster has not been as robust as expected. The most
> recent one was essentially ZOOKEEPER-2164 (and related: ZOOKEEPER-900). These
> can be mitigated by a ZK server configuration change but the incidents
> suggest it may be worth thinking about how to be less reliant on the service
> provided by a single ZK quorum instance.
> A solution would be holistic with several parts:
> - HBASE-18095 to get ZK dependencies out of the client
> - Related HBase replication improvements to track peer and position state in
> HBase tables instead of znodes
> - This brainstorming...
> For this issue, RecoverableZooKeeper (RZK) might be taught how to speak to
> two separate ZK quorum redundantly, so ZK client operations via RZK succeed
> even if one of them is temporarily unable to provide service. The loss of one
> of a pair (or more) of redundant quorums would no longer impact availability
> of the HBase service.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)