[ https://issues.apache.org/jira/browse/GEODE-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiaojian Zhou updated GEODE-9191: --------------------------------- Parent: GEODE-7665 Issue Type: Sub-task (was: Bug) > PR clear should not miss clearing bucket which lost primary > ----------------------------------------------------------- > > Key: GEODE-9191 > URL: https://issues.apache.org/jira/browse/GEODE-9191 > Project: Geode > Issue Type: Sub-task > Reporter: Xiaojian Zhou > Priority: Major > > This scenario is found when introducing GII test case for PR clear. The > sequence is: > (1) there're 3 servers, server1 is accessor, server2 and server3 are > datastores. > (2) shutdown server2 > (3) send PR clear from server1 (accessor) and restart server2 at the same > time. There's a race that server2 did not receive the > PartitionedRegionClearMessage. > (4) server2 finished GII > (5) only server3 received PartitionedRegionClearMessage and it hosts all the > primary buckets. When PR clear thread iterates through these primary buckets > one by one, some of them might lose primary to server2. > (6) BR.cmnClearRegion will return immediately since it's no longer primary, > but clearedBuckets.add(localPrimaryBucketRegion.getId()); will still be > called. So from the caller point of view, this bucket is cleared. It wouldn't > even throw PartitionedRegionPartialClearException. > The problem is: > before calling cmnClearRegion, we should call BR.doLockForPrimary to make > sure it's still primary. If not, throw exception. Then > clearedBuckets.add(localPrimaryBucketRegion.getId()); will not be called for > this bucket. > The expected behavior in this scenario is to throw > PartitionedRegionPartialClearException. -- This message was sent by Atlassian Jira (v8.3.4#803005)