[ 
https://issues.apache.org/jira/browse/HBASE-12770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-12770:
-----------------------------------
    Fix Version/s: 1.4.0

> Don't transfer all the queued hlogs of a dead server to the same alive server
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-12770
>                 URL: https://issues.apache.org/jira/browse/HBASE-12770
>             Project: HBase
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 2.0.0, 1.4.0
>            Reporter: Jianwei Cui
>            Assignee: Phil Yang
>            Priority: Minor
>             Fix For: 2.0.0, 1.4.0, 1.3.1
>
>         Attachments: HBASE-12770-branch-1-v1.patch, 
> HBASE-12770-branch-1-v2.patch, HBASE-12770-branch-1-v3.patch, 
> HBASE-12770-branch-1-v3.patch, HBASE-12770-branch-1-v3.patch, 
> HBASE-12770-branch-1-v3.patch, HBASE-12770-trunk.patch, HBASE-12770-v1.patch, 
> HBASE-12770-v2.patch, HBASE-12770-v3.patch, HBASE-12770-v3.patch
>
>
> When a region server is down(or the cluster restart), all the hlog queues 
> will be transferred by the same alive region server. In a shared cluster, we 
> might create several peers replicating data to different peer clusters. There 
> might be lots of hlogs queued for these peers caused by several reasons, such 
> as some peers might be disabled, or errors from peer cluster might prevent 
> the replication, or the replication sources may fail to read some hlog 
> because of hdfs problem. Then, if the server is down or restarted, another 
> alive server will take all the replication jobs of the dead server, this 
> might bring a big pressure to resources(network/disk read) of the alive 
> server and also is not fast enough to replicate the queued hlogs. And if the 
> alive server is down, all the replication jobs including that takes from 
> other dead servers will once again be totally transferred to another alive 
> server, this might cause a server have a large number of queued hlogs(in our 
> shared cluster, we find one server might have thousands of queued hlogs for 
> replication). As an optional way, is it reasonable that the alive server only 
> transfer one peer's hlogs from the dead server one time? Then, other alive 
> region servers might have the opportunity to transfer the hlogs of rest 
> peers. This may also help the queued hlogs be processed more fast. Any 
> discussion is welcome.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to