[ 
https://issues.apache.org/jira/browse/GEODE-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339313#comment-17339313
 ] 

Xiaojian Zhou edited comment on GEODE-9191 at 5/22/21, 1:06 AM:
----------------------------------------------------------------

More investigation found that the primary buckets could switch at any time 
especially when they are not balanced (usually happened in GII). We need to 
lock the primary from moving.

The revised design will be:
(1) coordinator(a server) assignAllBuckets. 
(2) coordinator sends lock message to all members. 
(3) upon received the lock message, each datastore server:
- waits for all the primaries to show up
- iterate through local primary bucket list to lock primary from moving and 
lock RVV
- reply with number of buckets locked 
(4) Coordinator collects all the number of buckets locked. If it match the 
totalBucketNumber, go on clear region. Otherwise, unlock and retry the whole PR 
clear.This retry will forever until succeeded, i.e. all the primary bucket are 
both locked RVV and locked from moving primary.  
(5) If a member is down, the membership listener will detected and let 
coordinator to retry. If too many members are down, wait for primary should 
fail with PartitionedRegionPartialClearException. Then coordinator will unlock 
and throw this exception to caller. 
(6) After locked all the members' primary buckets, coordinator sends clear 
message to all the members.
(7) each member clear primary buckets one by one and return number of buckets 
cleared.
(8) Coordinator collect all the numbers cleared, if less than expected bucket 
number, throw PartialClearException to caller. This could happen when a member 
is offline in the middle of clear. 
(9) If any member exit in the middle of clear, the membership listener at 
coordinator will be notified. It will unlock all the locks and retry from 
locking then clearing. In retry, if the missing member's buckets are recreated 
in other member, the retry succeed. Otherwise, the total cleared buckets number 
is still lower than expected (i.e. PartitionOffline happened), throw the 
PartialClearException. 
(10) if the coordinator exit in the middle of clear, unlock all the locks and 
throw PartialClearException.



was (Author: zhouxj):
More investigation found that the primary buckets could switch at any time 
especially when they are not balanced (usually happened in GII). We need to 
lock the primary from moving.

The revised design will be:
(1) coordinator(a server) assignAllBuckets. Then waits for all the primaries to 
show up. 
(2) coordinator sends lock message to all members. 
(3) upon received the lock message, each datastore server saves current primary 
bucket number for future reference. 
(4) At each datastore, iterate through local primary bucket list to lock 
primary from moving and lock rvv. If either total locked primary buckets or 
total locked rvv buckets at this member is different with previous saved 
primary bucket number, unlock all of them and return RetryException to 
coordinator. 
(5) If coordinator received retry exception it will resend lock message and 
retry forever until succeeded.
(6) After locked all the members' primary buckets, coordinator sends clear 
message to all the members.
(6) each member clear primary buckets one by one and return number of buckets 
cleared.
(7) Coordinator collect all the numbers cleared, if less than expected bucket 
number, throw PartialClearException to caller. This could happen when a member 
is offline in the middle of clear. 
(8) If any member exit in the middle of clear, the membership listener at 
coordinator will be notified. It will unlock all the locks and retry from 
locking then clearing. In retry, if the missing member's buckets are recreated 
in other member, the retry succeed. Otherwise, the total cleared buckets number 
is still lower than expected (i.e. PartitionOffline happened), throw the 
PartialClearException. 
(9) if the coordinator exit in the middle of clear, unlock all the locks and 
throw PartialClearException.


> PR clear could miss clearing bucket which lost primary
> ------------------------------------------------------
>
>                 Key: GEODE-9191
>                 URL: https://issues.apache.org/jira/browse/GEODE-9191
>             Project: Geode
>          Issue Type: Sub-task
>            Reporter: Xiaojian Zhou
>            Assignee: Xiaojian Zhou
>            Priority: Major
>              Labels: GeodeOperationAPI, pull-request-available
>
> This scenario is found when introducing GII test case for PR clear. The 
> sequence is:
> (1) there're 3 servers, server1 is accessor, server2 and server3 are 
> datastores.
> (2) shutdown server2
> (3) send PR clear from server1 (accessor) and restart server2 at the same 
> time. There's a race that server2 did not receive the 
> PartitionedRegionClearMessage.
> (4) server2 finished GII
> (5) only server3 received PartitionedRegionClearMessage and it hosts all the 
> primary buckets. When PR clear thread iterates through these primary buckets 
> one by one, some of them might lose primary to server2. 
> (6) BR.cmnClearRegion will return immediately since it's no longer primary, 
> but clearedBuckets.add(localPrimaryBucketRegion.getId()); will still be 
> called. So from the caller point of view, this bucket is cleared. It wouldn't 
> even throw PartitionedRegionPartialClearException.
> The problem is:
> before calling cmnClearRegion, we should call BR.doLockForPrimary to make 
> sure it's still primary. If not, throw exception.  Then 
> clearedBuckets.add(localPrimaryBucketRegion.getId()); will not be called for 
> this bucket. 
> The expected behavior in this scenario is to throw 
> PartitionedRegionPartialClearException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to