[
https://issues.apache.org/jira/browse/HBASE-11935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129254#comment-14129254
]
Andrew Purtell commented on HBASE-11935:
----------------------------------------
TestReplicationThrottler passed 10 out of 10 times locally for me, e.g.
{noformat}
Running
org.apache.hadoop.hbase.replication.regionserver.TestReplicationThrottler
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.537 sec - in
org.apache.hadoop.hbase.replication.regionserver.TestReplicationThrottler
Results :
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
{noformat}
> Unbounded creation of Replication Failover workers
> --------------------------------------------------
>
> Key: HBASE-11935
> URL: https://issues.apache.org/jira/browse/HBASE-11935
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.99.0, 2.0.0, 0.94.23, 0.98.6
> Reporter: Lars Hofhansl
> Assignee: Jesse Yates
> Priority: Critical
> Fix For: 2.0.0, 0.98.7, 0.94.24, 0.99.1
>
> Attachments: hbase-11935-0.98-v0.patch, hbase-11935-0.98-v1.patch,
> hbase-11935-trunk-v0.patch, hbase-11935-trunk-v1.patch,
> hbase-11935-trunk-v2.patch
>
>
> We just ran into a production incident with TCP SYN storms on port 2181
> (zookeeper).
> In our case the slave cluster was not running. When we bounced the primary
> cluster we saw an "unbounded" number of failover threads all hammering the
> hosts on the slave ZK machines (which did not run ZK at the time)... Causing
> overall degradation of network performance between datacenters.
> Looking at the code we noticed that the thread pool handling of the Failover
> workers was probably unintended.
> Patch coming soon.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)