Xiaojian Zhou created GEODE-9191:
------------------------------------
Summary: PR clear should not miss clearing bucket which lost
primary
Key: GEODE-9191
URL: https://issues.apache.org/jira/browse/GEODE-9191
Project: Geode
Issue Type: Bug
Reporter: Xiaojian Zhou
This scenario is found when introducing GII test case for PR clear. The
sequence is:
(1) there're 3 servers, server1 is accessor, server2 and server3 are datastores.
(2) shutdown server2
(3) send PR clear from server1 (accessor) and restart server2 at the same time.
There's a race that server2 did not receive the PartitionedRegionClearMessage.
(4) server2 finished GII
(5) only server3 received PartitionedRegionClearMessage and it hosts all the
primary buckets. When PR clear thread iterates through these primary buckets
one by one, some of them might lose primary to server2.
(6) BR.cmnClearRegion will return immediately since it's no longer primary, but
clearedBuckets.add(localPrimaryBucketRegion.getId()); will still be called. So
from the caller point of view, this bucket is cleared. It wouldn't even throw
PartitionedRegionPartialClearException.
The problem is:
before calling cmnClearRegion, we should call BR.doLockForPrimary to make sure
it's still primary. If not, throw exception. Then
clearedBuckets.add(localPrimaryBucketRegion.getId()); will not be called for
this bucket.
The expected behavior in this scenario is to throw
PartitionedRegionPartialClearException.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)