[ 
https://issues.apache.org/jira/browse/HBASE-11935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-11935:
-----------------------------------
    Attachment: hbase-11935-trunk-v0.patch

Patch for trunk

Replication tests pass locally:
{noformat}
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.hadoop.hbase.replication.TestMasterReplication
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 48.946 sec - in 
org.apache.hadoop.hbase.replication.TestMasterReplication
Running org.apache.hadoop.hbase.replication.TestMultiSlaveReplication
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.37 sec - in 
org.apache.hadoop.hbase.replication.TestMultiSlaveReplication
Running org.apache.hadoop.hbase.replication.TestPerTableCFReplication
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 37.85 sec - in 
org.apache.hadoop.hbase.replication.TestPerTableCFReplication
Running 
org.apache.hadoop.hbase.replication.TestReplicationChangingPeerRegionservers
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 20.904 sec - in 
org.apache.hadoop.hbase.replication.TestReplicationChangingPeerRegionservers
Running org.apache.hadoop.hbase.replication.TestReplicationDisableInactivePeer
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 34.352 sec - in 
org.apache.hadoop.hbase.replication.TestReplicationDisableInactivePeer
Running org.apache.hadoop.hbase.replication.TestReplicationEndpoint
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.599 sec - in 
org.apache.hadoop.hbase.replication.TestReplicationEndpoint
Running org.apache.hadoop.hbase.replication.TestReplicationKillMasterRS
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 50.607 sec - in 
org.apache.hadoop.hbase.replication.TestReplicationKillMasterRS
Running 
org.apache.hadoop.hbase.replication.TestReplicationKillMasterRSCompressed
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 33.881 sec - in 
org.apache.hadoop.hbase.replication.TestReplicationKillMasterRSCompressed
Running org.apache.hadoop.hbase.replication.TestReplicationKillSlaveRS
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.436 sec - in 
org.apache.hadoop.hbase.replication.TestReplicationKillSlaveRS
Running org.apache.hadoop.hbase.replication.TestReplicationSmallTests
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 48.888 sec - in 
org.apache.hadoop.hbase.replication.TestReplicationSmallTests
Running org.apache.hadoop.hbase.replication.TestReplicationSource
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.846 sec - in 
org.apache.hadoop.hbase.replication.TestReplicationSource
Running org.apache.hadoop.hbase.replication.TestReplicationStateZKImpl
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.465 sec - in 
org.apache.hadoop.hbase.replication.TestReplicationStateZKImpl
Running org.apache.hadoop.hbase.replication.TestReplicationSyncUpTool
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 67.017 sec - in 
org.apache.hadoop.hbase.replication.TestReplicationSyncUpTool
Running org.apache.hadoop.hbase.replication.TestReplicationTrackerZKImpl
Tests run: 4, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 1.738 sec - in 
org.apache.hadoop.hbase.replication.TestReplicationTrackerZKImpl
Running org.apache.hadoop.hbase.replication.TestReplicationWALEntryFilters
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.122 sec - in 
org.apache.hadoop.hbase.replication.TestReplicationWALEntryFilters
Running org.apache.hadoop.hbase.replication.TestReplicationWithTags
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.966 sec - in 
org.apache.hadoop.hbase.replication.TestReplicationWithTags

Results :

Tests run: 40, Failures: 0, Errors: 0, Skipped: 2
{noformat}

> Unbounded creation of Replication Failover workers
> --------------------------------------------------
>
>                 Key: HBASE-11935
>                 URL: https://issues.apache.org/jira/browse/HBASE-11935
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.99.0, 2.0.0, 0.94.23, 0.98.6
>            Reporter: Lars Hofhansl
>            Assignee: Jesse Yates
>            Priority: Critical
>             Fix For: 2.0.0, 0.98.7, 0.94.24, 0.99.1
>
>         Attachments: hbase-11935-0.98-v0.patch, hbase-11935-trunk-v0.patch
>
>
> We just ran into a production incident with TCP SYN storms on port 2181 
> (zookeeper).
> In our case the slave cluster was not running. When we bounced the primary 
> cluster we saw an "unbounded" number of failover threads all hammering the 
> hosts on the slave ZK machines (which did not run ZK at the time)... Causing 
> overall degradation of network performance between datacenters.
> Looking at the code we noticed that the thread pool handling of the Failover 
> workers was probably unintended.
> Patch coming soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to