[
https://issues.apache.org/jira/browse/HBASE-12770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956279#comment-15956279
]
Hudson commented on HBASE-12770:
--------------------------------
SUCCESS: Integrated in Jenkins build HBase-1.3-IT #19 (See
[https://builds.apache.org/job/HBase-1.3-IT/19/])
HBASE-12770 Don't transfer all the queued hlogs of a dead server to the
(antonov: rev fd297e280f25c26346c3343d6ea1be4f0362821e)
* (edit)
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java
* (edit)
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
* (edit)
hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueues.java
* (edit)
hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueuesZKImpl.java
* (edit)
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationStateBasic.java
> Don't transfer all the queued hlogs of a dead server to the same alive server
> -----------------------------------------------------------------------------
>
> Key: HBASE-12770
> URL: https://issues.apache.org/jira/browse/HBASE-12770
> Project: HBase
> Issue Type: Improvement
> Components: Replication
> Affects Versions: 2.0.0, 1.4.0
> Reporter: Jianwei Cui
> Assignee: Phil Yang
> Priority: Minor
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-12770-branch-1-v1.patch,
> HBASE-12770-branch-1-v2.patch, HBASE-12770-branch-1-v3.patch,
> HBASE-12770-branch-1-v3.patch, HBASE-12770-branch-1-v3.patch,
> HBASE-12770-branch-1-v3.patch, HBASE-12770-trunk.patch, HBASE-12770-v1.patch,
> HBASE-12770-v2.patch, HBASE-12770-v3.patch, HBASE-12770-v3.patch
>
>
> When a region server is down(or the cluster restart), all the hlog queues
> will be transferred by the same alive region server. In a shared cluster, we
> might create several peers replicating data to different peer clusters. There
> might be lots of hlogs queued for these peers caused by several reasons, such
> as some peers might be disabled, or errors from peer cluster might prevent
> the replication, or the replication sources may fail to read some hlog
> because of hdfs problem. Then, if the server is down or restarted, another
> alive server will take all the replication jobs of the dead server, this
> might bring a big pressure to resources(network/disk read) of the alive
> server and also is not fast enough to replicate the queued hlogs. And if the
> alive server is down, all the replication jobs including that takes from
> other dead servers will once again be totally transferred to another alive
> server, this might cause a server have a large number of queued hlogs(in our
> shared cluster, we find one server might have thousands of queued hlogs for
> replication). As an optional way, is it reasonable that the alive server only
> transfer one peer's hlogs from the dead server one time? Then, other alive
> region servers might have the opportunity to transfer the hlogs of rest
> peers. This may also help the queued hlogs be processed more fast. Any
> discussion is welcome.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)