[ https://issues.apache.org/jira/browse/GEODE-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339313#comment-17339313 ]
Xiaojian Zhou edited comment on GEODE-9191 at 5/4/21, 10:28 PM: ---------------------------------------------------------------- More investigation found that the primary buckets could switch at any time especially when they are not balanced (usually happened in GII). We need to lock the primary from moving. The revised design will be: (1) coordinator(a server) assignAllBuckets. Then waits for all the primaries to show up. (2) coordinator sends lock message to all members. (3) upon received the lock message, each datastore server saves current primary bucket number for future reference. (4) At each datastore, iterate through local primary bucket list to lock primary from moving and lock rvv. If either total locked primary buckets or total locked rvv buckets at this member is different with previous saved primary bucket number, unlock all of them and return RetryException to coordinator. (5) If coordinator received retry exception it will resend lock message and retry forever until succeeded. (6) After locked all the members' primary buckets, coordinator sends clear message to all the members. (6) each member clear primary buckets one by one and return number of buckets cleared. (7) Coordinator collect all the numbers cleared, if less than expected bucket number, throw PartialClearException to caller. This could happen when a member is offline in the middle of clear. (8) If any member exit in the middle of clear, the membership listener at coordinator will be notified. It will unlock all the locks and retry from locking then clearing. In retry, if the missing member's buckets are recreated in other member, the retry succeed. Otherwise, the total cleared buckets number is still lower than expected (i.e. PartitionOffline happened), throw the PartialClearException. (9) if the coordinator exit in the middle of clear, unlock all the locks and throw PartialClearException. was (Author: zhouxj): More investigation found that the primary buckets could switch at any time especially when they are not balanced (usually happened in GII). We need to lock the primary from moving. The revised design will be: (1) coordinator(a server) assignAllBuckets. Then waits for all the primaries to show up. (2) coordinator sends lock message to all members. (3) upon received the lock message, each datastore server lockBucketCreationForRegionClear() then save current primary bucket number. (4) At each datastore, iterate through local primary bucket list to lock primary from moving and lock rvv. If either total locked primary buckets or total locked rvv buckets at this member is different with previous saved primary bucket number, unlock all of them and return RetryException to coordinator. (5) If coordinator received retry exception it will resend lock message and retry forever until succeeded. (6) After locked all the members' primary buckets, coordinator sends clear message to all the members. (6) each member clear primary buckets one by one and return number of buckets cleared. (7) Coordinator collect all the numbers cleared, if less than expected bucket number, throw PartialClearException to caller. This could happen when a member is offline in the middle of clear. (8) If any member exit in the middle of clear, since we will not allow to create bucket during clear, so unlock all the locks and return number of buckets cleared to coordinator. The coordinator will finally throw PartialClearException. (9) if the coordinator exit in the middle of clear, unlock all the locks and throw PartialClearException. > PR clear could miss clearing bucket which lost primary > ------------------------------------------------------ > > Key: GEODE-9191 > URL: https://issues.apache.org/jira/browse/GEODE-9191 > Project: Geode > Issue Type: Sub-task > Reporter: Xiaojian Zhou > Assignee: Xiaojian Zhou > Priority: Major > Labels: GeodeOperationAPI, pull-request-available > > This scenario is found when introducing GII test case for PR clear. The > sequence is: > (1) there're 3 servers, server1 is accessor, server2 and server3 are > datastores. > (2) shutdown server2 > (3) send PR clear from server1 (accessor) and restart server2 at the same > time. There's a race that server2 did not receive the > PartitionedRegionClearMessage. > (4) server2 finished GII > (5) only server3 received PartitionedRegionClearMessage and it hosts all the > primary buckets. When PR clear thread iterates through these primary buckets > one by one, some of them might lose primary to server2. > (6) BR.cmnClearRegion will return immediately since it's no longer primary, > but clearedBuckets.add(localPrimaryBucketRegion.getId()); will still be > called. So from the caller point of view, this bucket is cleared. It wouldn't > even throw PartitionedRegionPartialClearException. > The problem is: > before calling cmnClearRegion, we should call BR.doLockForPrimary to make > sure it's still primary. If not, throw exception. Then > clearedBuckets.add(localPrimaryBucketRegion.getId()); will not be called for > this bucket. > The expected behavior in this scenario is to throw > PartitionedRegionPartialClearException. -- This message was sent by Atlassian Jira (v8.3.4#803005)