Andrew Purtell created HBASE-20597:
--------------------------------------
Summary: Use a lock to serialize access to a shared reference to
ZooKeeperWatcher in HBaseReplicationEndpoint
Key: HBASE-20597
URL: https://issues.apache.org/jira/browse/HBASE-20597
Project: HBase
Issue Type: Bug
Affects Versions: 1.4.4, 1.3.2
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Fix For: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 2.0.1, 1.4.5
The code that closes down a ZKW that fails to initialize when attempting to
connect to the remote cluster is not MT safe and can in theory leak
ZooKeeperWatcher instances. The allocation of a new ZKW and store to the
reference is not atomic. Might have concurrent allocations with only one
winning store, leading to leaked ZKW instances. If the connection problem is
persistent, like loss of shared trust between the clusters, we may accumulate
unclosed ZKW instances over time, with a ZK send thread and event thread each,
and eventually have enough leaked threads to cause OOME (cannot allocate native
thread).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)