Xiaojian Zhou created GEODE-9191:
------------------------------------

             Summary: PR clear should not miss clearing bucket which lost 
primary
                 Key: GEODE-9191
                 URL: https://issues.apache.org/jira/browse/GEODE-9191
             Project: Geode
          Issue Type: Bug
            Reporter: Xiaojian Zhou


This scenario is found when introducing GII test case for PR clear. The 
sequence is:

(1) there're 3 servers, server1 is accessor, server2 and server3 are datastores.
(2) shutdown server2
(3) send PR clear from server1 (accessor) and restart server2 at the same time. 
There's a race that server2 did not receive the PartitionedRegionClearMessage.
(4) server2 finished GII
(5) only server3 received PartitionedRegionClearMessage and it hosts all the 
primary buckets. When PR clear thread iterates through these primary buckets 
one by one, some of them might lose primary to server2. 
(6) BR.cmnClearRegion will return immediately since it's no longer primary, but 
clearedBuckets.add(localPrimaryBucketRegion.getId()); will still be called. So 
from the caller point of view, this bucket is cleared. It wouldn't even throw 
PartitionedRegionPartialClearException.

The problem is:
before calling cmnClearRegion, we should call BR.doLockForPrimary to make sure 
it's still primary. If not, throw exception.  Then 
clearedBuckets.add(localPrimaryBucketRegion.getId()); will not be called for 
this bucket. 
The expected behavior in this scenario is to throw 
PartitionedRegionPartialClearException.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to