cuijianwei created HBASE-12770:
----------------------------------

             Summary: Don't transfer all the queued hlogs of a dead server to 
the same alive server
                 Key: HBASE-12770
                 URL: https://issues.apache.org/jira/browse/HBASE-12770
             Project: HBase
          Issue Type: Improvement
          Components: Replication
            Reporter: cuijianwei
            Priority: Minor


When a region server is down(or the cluster restart), all the hlog queues will 
be transferred by the same alive region server. In a shared cluster, we might 
create several peers replicating data to different peer clusters. There might 
be lots of hlogs queued for these peers caused by several reasons, such as some 
peers might be disabled, or errors from peer cluster might prevent the 
replication, or the replication sources may fail to read some hlog because of 
hdfs problem. Then, if the server is down or restarted, another alive server 
will take all the replication jobs of the dead server, this might bring a big 
pressure to resources(network/disk read) of the alive server and also is not 
fast enough to replicate the queued hlogs. And if the alive server is down, all 
the replication jobs including that takes from other dead servers will once 
again be totally transferred to another alive server, this might cause a server 
have a large number of queued hlogs(in our shared cluster, we find one server 
might have thousands of queued hlogs for replication). As an optional way, is 
it reasonable that the alive server only transfer one peer's hlogs from the 
dead server one time? Then, other alive region servers might have the 
opportunity to transfer the hlogs of rest peers. This may also help the queued 
hlogs be processed more fast. Any discussion is welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to