[ https://issues.apache.org/jira/browse/GEODE-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339313#comment-17339313 ]
Xiaojian Zhou edited comment on GEODE-9191 at 5/22/21, 1:06 AM: ---------------------------------------------------------------- More investigation found that the primary buckets could switch at any time especially when they are not balanced (usually happened in GII). We need to lock the primary from moving. The revised design will be: (1) coordinator(a server) assignAllBuckets. (2) coordinator sends lock message to all members. (3) upon received the lock message, each datastore server: - waits for all the primaries to show up - iterate through local primary bucket list to lock primary from moving and lock RVV - reply with number of buckets locked (4) Coordinator collects all the number of buckets locked. If it match the totalBucketNumber, go on clear region. Otherwise, unlock and retry the whole PR clear.This retry will forever until succeeded, i.e. all the primary bucket are both locked RVV and locked from moving primary. (5) If a member is down, the membership listener will detected and let coordinator to retry. If too many members are down, wait for primary should fail with PartitionedRegionPartialClearException. Then coordinator will unlock and throw this exception to caller. (6) After locked all the members' primary buckets, coordinator sends clear message to all the members. (7) each member clear primary buckets one by one and return number of buckets cleared. (8) Coordinator collect all the numbers cleared, if less than expected bucket number, throw PartialClearException to caller. This could happen when a member is offline in the middle of clear. (9) If any member exit in the middle of clear, the membership listener at coordinator will be notified. It will unlock all the locks and retry from locking then clearing. In retry, if the missing member's buckets are recreated in other member, the retry succeed. Otherwise, the total cleared buckets number is still lower than expected (i.e. PartitionOffline happened), throw the PartialClearException. (10) if the coordinator exit in the middle of clear, unlock all the locks and throw PartialClearException. was (Author: zhouxj): More investigation found that the primary buckets could switch at any time especially when they are not balanced (usually happened in GII). We need to lock the primary from moving. The revised design will be: (1) coordinator(a server) assignAllBuckets. Then waits for all the primaries to show up. (2) coordinator sends lock message to all members. (3) upon received the lock message, each datastore server saves current primary bucket number for future reference. (4) At each datastore, iterate through local primary bucket list to lock primary from moving and lock rvv. If either total locked primary buckets or total locked rvv buckets at this member is different with previous saved primary bucket number, unlock all of them and return RetryException to coordinator. (5) If coordinator received retry exception it will resend lock message and retry forever until succeeded. (6) After locked all the members' primary buckets, coordinator sends clear message to all the members. (6) each member clear primary buckets one by one and return number of buckets cleared. (7) Coordinator collect all the numbers cleared, if less than expected bucket number, throw PartialClearException to caller. This could happen when a member is offline in the middle of clear. (8) If any member exit in the middle of clear, the membership listener at coordinator will be notified. It will unlock all the locks and retry from locking then clearing. In retry, if the missing member's buckets are recreated in other member, the retry succeed. Otherwise, the total cleared buckets number is still lower than expected (i.e. PartitionOffline happened), throw the PartialClearException. (9) if the coordinator exit in the middle of clear, unlock all the locks and throw PartialClearException. > PR clear could miss clearing bucket which lost primary > ------------------------------------------------------ > > Key: GEODE-9191 > URL: https://issues.apache.org/jira/browse/GEODE-9191 > Project: Geode > Issue Type: Sub-task > Reporter: Xiaojian Zhou > Assignee: Xiaojian Zhou > Priority: Major > Labels: GeodeOperationAPI, pull-request-available > > This scenario is found when introducing GII test case for PR clear. The > sequence is: > (1) there're 3 servers, server1 is accessor, server2 and server3 are > datastores. > (2) shutdown server2 > (3) send PR clear from server1 (accessor) and restart server2 at the same > time. There's a race that server2 did not receive the > PartitionedRegionClearMessage. > (4) server2 finished GII > (5) only server3 received PartitionedRegionClearMessage and it hosts all the > primary buckets. When PR clear thread iterates through these primary buckets > one by one, some of them might lose primary to server2. > (6) BR.cmnClearRegion will return immediately since it's no longer primary, > but clearedBuckets.add(localPrimaryBucketRegion.getId()); will still be > called. So from the caller point of view, this bucket is cleared. It wouldn't > even throw PartitionedRegionPartialClearException. > The problem is: > before calling cmnClearRegion, we should call BR.doLockForPrimary to make > sure it's still primary. If not, throw exception. Then > clearedBuckets.add(localPrimaryBucketRegion.getId()); will not be called for > this bucket. > The expected behavior in this scenario is to throw > PartitionedRegionPartialClearException. -- This message was sent by Atlassian Jira (v8.3.4#803005)