[
https://issues.apache.org/jira/browse/HBASE-19216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16250989#comment-16250989
]
stack commented on HBASE-19216:
-------------------------------
Sounds fine. The steps you describe resemble how open/close region work
(AssignProcedure and UnassignProcedure) except you are doing a fan out to all
RegionServers and then they are all to phone home to the Master when done?
What if all RS fail to report in? Procedures need to complete. Could 'fail'
and we do 'rollback/cleanup'?
What if a RS dies? ServerCrashProcedure cleans up stuck Assign/Unassigns. Could
do similar.
Currently, open region (AssignProcedure) schedules remote request. The
Procedure then suspends itself. The RS RPCs to Master to tell it Region open.
The Master then 'wakes up' the suspended procedure to proceed (or fail).
When you need this by?
> Use procedure to execute replication peer related operations
> ------------------------------------------------------------
>
> Key: HBASE-19216
> URL: https://issues.apache.org/jira/browse/HBASE-19216
> Project: HBase
> Issue Type: Improvement
> Reporter: Duo Zhang
>
> When building the basic framework for HBASE-19064, I found that the
> enable/disable peer is built upon the watcher of zk.
> The problem of using watcher is that, you do not know the exact time when all
> RSes in the cluster have done the change, it is a 'eventually done'.
> And for synchronous replication, when changing the state of a replication
> peer, we need to know the exact time as we can only enable read/write after
> that time. So I think we'd better use procedure to do this. Change the flag
> on zk, and then execute a procedure on all RSes to reload the flag from zk.
> Another benefit is that, after the change, zk will be mainly used as a
> storage, so it will be easy to implement another replication peer storage to
> replace zk so that we can reduce the dependency on zk.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)