[
https://issues.apache.org/jira/browse/HBASE-12769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14260327#comment-14260327
]
Andrew Purtell commented on HBASE-12769:
----------------------------------------
Would it also make sense to teach HBCK about replication? Above suggestions
make sense but we'd need a tool to resolve peer entries stuck for some reason
in REMOVING state.
> Replication fails to delete all corresponding zk nodes when peer is removed
> ---------------------------------------------------------------------------
>
> Key: HBASE-12769
> URL: https://issues.apache.org/jira/browse/HBASE-12769
> Project: HBase
> Issue Type: Improvement
> Components: Replication
> Affects Versions: 0.99.2
> Reporter: cuijianwei
> Priority: Minor
>
> When removing a peer, the client side will delete peerId under peersZNode
> node; then alive region servers will be notified and delete corresponding
> hlog queues under its rsZNode of replication. However, if there are failed
> servers whose hlog queues have not been transferred by alive servers(this
> likely happens if setting a big value to "replication.sleep.before.failover"
> and lots of region servers restarted), these hlog queues won't be deleted
> after the peer is removed. I think remove_peer should guarantee all
> corresponding zk nodes have been removed after it completes; otherwise, if we
> create a new peer with the same peerId with the removed one, there might be
> unexpected data to be replicated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)