[ 
https://issues.apache.org/jira/browse/HBASE-12769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14260100#comment-14260100
 ] 

cuijianwei commented on HBASE-12769:
------------------------------------

As a way to solve this problem, we might need:
1. set the peer to "REMOVING" state if the client decide to move a peer;
2. the client(or a worker of master) needs to help to delete hlog queues of 
dead servers which has not been transferred;
3. After all corresponding zk nodes deleted, the client could delete peerId zk 
node under peersZNode.
If the removing process fails, there will be a "REMOVING" state for the peerId 
and user should fail if tends to add a new peer with the same peerId.

> Replication fails to delete all corresponding zk nodes when peer is removed
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-12769
>                 URL: https://issues.apache.org/jira/browse/HBASE-12769
>             Project: HBase
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.99.2
>            Reporter: cuijianwei
>            Priority: Minor
>
> When removing a peer, the client side will delete peerId under peersZNode 
> node; then alive region servers will be notified and delete corresponding 
> hlog queues under its rsZNode of replication. However, if there are failed 
> servers whose hlog queues have not been transferred by alive servers(this 
> likely happens if setting a big value to "replication.sleep.before.failover" 
> and lots of region servers restarted), these hlog queues won't be deleted 
> after the peer is removed. I think remove_peer should guarantee all 
> corresponding zk nodes have been removed after it completes; otherwise, if we 
> create a new peer with the same peerId with the removed one, there might be 
> unexpected data to be replicated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to