[ 
https://issues.apache.org/jira/browse/HBASE-19216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256287#comment-16256287
 ] 

stack commented on HBASE-19216:
-------------------------------

bq. For a peer change, I think it is idempotent, so we can retry forever if an 
RS fails to report in.

Ok. We just need to stop pinging if the server goes away.

bq. I plan to add a reportProcedureDone method in RegionServerStatusService

Ok. Should do for a few procedure types.

bq. How can I wake up a suspended procedure?

In Assign/Unassign, we have RegionStateNodes that have in them a reference to 
the Procedure that is manipulating the RS and an associated ProcedureEvent.  
Suspend/resume operates on the RSN PE. Before we dispatch an RPC, we do a 
suspend on the RSN PE. When RS has transitioned the Region, it updates master 
by calling reportRegionStateTransition.  Master finds the pertinent RSN using 
RegionInfo as key. We pull out the Procedure and call reportTransition on it. 
After updating state in the Procedure, the last thing done is a wake up call on 
the PE.

We'd have a registry of Peers in Master (ReplicationPeers?) keyed by peerid?. 
The Peer in Master would carry Procedure and PE reference.

Something like that.

bq. I need to create one by myself when suspending the procedure and store it 
in the procedure, so I can get it through the procedureId?

When we create a Peer, it would have in it a PE. The PE would not be created 
each time we want to do a suspend because we want to guard against having more 
than one operation going on against a Peer at a time. The key could be 
procedureid but could it be peerid instead?




So, setting peer would work like 


> Use procedure to execute replication peer related operations
> ------------------------------------------------------------
>
>                 Key: HBASE-19216
>                 URL: https://issues.apache.org/jira/browse/HBASE-19216
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Duo Zhang
>
> When building the basic framework for HBASE-19064, I found that the 
> enable/disable peer is built upon the watcher of zk.
> The problem of using watcher is that, you do not know the exact time when all 
> RSes in the cluster have done the change, it is a 'eventually done'. 
> And for synchronous replication, when changing the state of a replication 
> peer, we need to know the exact time as we can only enable read/write after 
> that time. So I think we'd better use procedure to do this. Change the flag 
> on zk, and then execute a procedure on all RSes to reload the flag from zk.
> Another benefit is that, after the change, zk will be mainly used as a 
> storage, so it will be easy to implement another replication peer storage to 
> replace zk so that we can reduce the dependency on zk.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to