Lars Hofhansl created HBASE-11935:
-------------------------------------
Summary: Unbounded creation of Replication Failover workers
Key: HBASE-11935
URL: https://issues.apache.org/jira/browse/HBASE-11935
Project: HBase
Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Critical
Fix For: 2.0.0, 0.98.7, 0.94.24, 0.99.1
We just ran into a production incident with TCP SYN storms on port 2181
(zookeeper).
In our case the slave cluster was not running. When we bounced the primary
cluster we saw an "unbounded" number of failover threads all hammering the
hosts on the slave ZK machines (which did not run ZK at the time)... Causing
overall degradation of network performance between datacenters.
Looking at the code we noticed that the thread pool handling of the Failover
workers was probably unintended.
Patch coming soon.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)