[ 
https://issues.apache.org/jira/browse/HBASE-12770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956380#comment-15956380
 ] 

Hudson commented on HBASE-12770:
--------------------------------

SUCCESS: Integrated in Jenkins build HBase-1.3-JDK8 #146 (See 
[https://builds.apache.org/job/HBase-1.3-JDK8/146/])
HBASE-12770 Don't transfer all the queued hlogs of a dead server to the 
(antonov: rev fd297e280f25c26346c3343d6ea1be4f0362821e)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueuesZKImpl.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueues.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationStateBasic.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java


> Don't transfer all the queued hlogs of a dead server to the same alive server
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-12770
>                 URL: https://issues.apache.org/jira/browse/HBASE-12770
>             Project: HBase
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 2.0.0, 1.4.0
>            Reporter: Jianwei Cui
>            Assignee: Phil Yang
>            Priority: Minor
>             Fix For: 2.0.0, 1.4.0
>
>         Attachments: HBASE-12770-branch-1-v1.patch, 
> HBASE-12770-branch-1-v2.patch, HBASE-12770-branch-1-v3.patch, 
> HBASE-12770-branch-1-v3.patch, HBASE-12770-branch-1-v3.patch, 
> HBASE-12770-branch-1-v3.patch, HBASE-12770-trunk.patch, HBASE-12770-v1.patch, 
> HBASE-12770-v2.patch, HBASE-12770-v3.patch, HBASE-12770-v3.patch
>
>
> When a region server is down(or the cluster restart), all the hlog queues 
> will be transferred by the same alive region server. In a shared cluster, we 
> might create several peers replicating data to different peer clusters. There 
> might be lots of hlogs queued for these peers caused by several reasons, such 
> as some peers might be disabled, or errors from peer cluster might prevent 
> the replication, or the replication sources may fail to read some hlog 
> because of hdfs problem. Then, if the server is down or restarted, another 
> alive server will take all the replication jobs of the dead server, this 
> might bring a big pressure to resources(network/disk read) of the alive 
> server and also is not fast enough to replicate the queued hlogs. And if the 
> alive server is down, all the replication jobs including that takes from 
> other dead servers will once again be totally transferred to another alive 
> server, this might cause a server have a large number of queued hlogs(in our 
> shared cluster, we find one server might have thousands of queued hlogs for 
> replication). As an optional way, is it reasonable that the alive server only 
> transfer one peer's hlogs from the dead server one time? Then, other alive 
> region servers might have the opportunity to transfer the hlogs of rest 
> peers. This may also help the queued hlogs be processed more fast. Any 
> discussion is welcome.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to