[ https://issues.apache.org/jira/browse/HBASE-11935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jesse Yates updated HBASE-11935: -------------------------------- Attachment: hbase-11935-0.98-v1.patch Patch for 0.98 w/ a log message every time we start a new ReplicationSource and the logs its taking over, at debug level > Unbounded creation of Replication Failover workers > -------------------------------------------------- > > Key: HBASE-11935 > URL: https://issues.apache.org/jira/browse/HBASE-11935 > Project: HBase > Issue Type: Bug > Affects Versions: 0.99.0, 2.0.0, 0.94.23, 0.98.6 > Reporter: Lars Hofhansl > Assignee: Jesse Yates > Priority: Critical > Fix For: 2.0.0, 0.98.7, 0.94.24, 0.99.1 > > Attachments: hbase-11935-0.98-v0.patch, hbase-11935-0.98-v1.patch, > hbase-11935-trunk-v0.patch > > > We just ran into a production incident with TCP SYN storms on port 2181 > (zookeeper). > In our case the slave cluster was not running. When we bounced the primary > cluster we saw an "unbounded" number of failover threads all hammering the > hosts on the slave ZK machines (which did not run ZK at the time)... Causing > overall degradation of network performance between datacenters. > Looking at the code we noticed that the thread pool handling of the Failover > workers was probably unintended. > Patch coming soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)