[
https://issues.apache.org/jira/browse/HBASE-11963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133088#comment-14133088
]
Lars Hofhansl commented on HBASE-11963:
---------------------------------------
Also lemme explain what happened:
* We have a ReplicationPeer per slave cluster
* We have a ReplicationSource for every "queue" to replicate. A queue is either
the data this region wishes to replicate or data it took over for another
region server (for example when that region server went down)
* When we take over a queue from another region server we have *multiple*
ReplicationSources replicating to the *same* set of ReplicationPeers.
* When the slave cluster is down, the ReplicationSources attempt to reset their
peers upon each failed request.
* And hence now we have race where multiple ReplicationSources attempt to
reconnect a peer simultaneously. That caused the race condition and leaked ZK
clients.
* Each of the leaked client would attempt to reconnect to the slave once/sec
until the ZK timeout (defaulting to 180s).
So this only happens when (a) we have some queues failed over from another
region server *and* (b) a peer is not currently reachable (or there are some
other ZK issues) causing the source and reconnect its peer.
But if we have this condition it gets nasty pretty quickly.
> Synchronize peer cluster replication connection attempts
> --------------------------------------------------------
>
> Key: HBASE-11963
> URL: https://issues.apache.org/jira/browse/HBASE-11963
> Project: HBase
> Issue Type: Sub-task
> Reporter: Andrew Purtell
> Assignee: Maddineni Sukumar
> Fix For: 2.0.0, 0.98.7, 0.94.24, 0.99.1
>
> Attachments: 11963-0.94.txt, HBASE-11963-0.98.patch, HBASE-11963.patch
>
>
> Synchronize peer cluster connection attempts to avoid races and rate limit
> connections when multiple replication sources try to connect to the peer
> cluster. If the peer cluster is down we can get out of control over time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)