[jira] [Commented] (HBASE-11935) Unbounded creation of Replication Failover workers

Andrew Purtell (JIRA) Wed, 10 Sep 2014 11:52:02 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-11935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128924#comment-14128924
 ]


Andrew Purtell commented on HBASE-11935:
----------------------------------------

We could get unbounded ReplicationSource allocation in 
ReplicationSourceManager.NodeFailoverWorker.run:
{noformat}
  ReplicationTrackerZKImpl.OtherRegionServerWatcher.nodeDeleted ->
      ReplicationSourceManager.regionServerRemoved ->
      ReplicationSourceManager.transferQueues ->
      NodeFailoverWorker.run ->
      ReplicationSourceManager.getReplicationSource ->
      new ReplicationSource
{noformat}

> Unbounded creation of Replication Failover workers
> --------------------------------------------------
>
>                 Key: HBASE-11935
>                 URL: https://issues.apache.org/jira/browse/HBASE-11935
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Jesse Yates
>            Priority: Critical
>             Fix For: 2.0.0, 0.98.7, 0.94.24, 0.99.1
>
>         Attachments: hbase-11935-0.98-v0.patch
>
>
> We just ran into a production incident with TCP SYN storms on port 2181 
> (zookeeper).
> In our case the slave cluster was not running. When we bounced the primary 
> cluster we saw an "unbounded" number of failover threads all hammering the 
> hosts on the slave ZK machines (which did not run ZK at the time)... Causing 
> overall degradation of network performance between datacenters.
> Looking at the code we noticed that the thread pool handling of the Failover 
> workers was probably unintended.
> Patch coming soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-11935) Unbounded creation of Replication Failover workers

Reply via email to