Barry Oglesby created GEODE-6931:
------------------------------------

             Summary: A failed RemotePutMessage can cause a 
PersistentReplicatesOfflineException to be thrown when no persistent members 
are offline
                 Key: GEODE-6931
                 URL: https://issues.apache.org/jira/browse/GEODE-6931
             Project: Geode
          Issue Type: Bug
          Components: messaging
            Reporter: Barry Oglesby


One of the places that RemotePutMessage is sent is DistributedRegion virtualPut.

Its sent from this method in this case:

- 2 wan sites
- the member in the receiving site that processes the batch defines the region 
as replicate proxy
- other receiving site members define the region as replicate persistent

DistributedRegion virtualPut is invoked by the GatewayReceiverCommand here:
{noformat}
java.lang.Exception: Stack trace
        at java.lang.Thread.dumpStack(Thread.java:1333)
        at 
org.apache.geode.internal.cache.DistributedRegion.virtualPut(DistributedRegion.java:341)
        at 
org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:162)
        at 
org.apache.geode.internal.cache.LocalRegion.basicUpdate(LocalRegion.java:5549)
        at 
org.apache.geode.internal.cache.LocalRegion.basicBridgePut(LocalRegion.java:5200)
        at 
org.apache.geode.internal.cache.tier.sockets.command.GatewayReceiverCommand.cmdExecute(GatewayReceiverCommand.java:429)
{noformat}
In this case, requiresOneHopForMissingEntry called by virtualPut returns true 
since a proxy region with other persistent replicates can't generate a version 
tag. This causes RemotePutMessage.distribute to be called.

If didDistribute returns false from RemotePutMessage.distribute (meaning the 
distribution failed), a PersistentReplicatesOfflineException is thrown 
regardless of the actual exception on the remote member:
{noformat}
if (!generateVersionTag && !didDistribute) {
  throw new PersistentReplicatesOfflineException();
}
{noformat}
One of the ways that didDistribute can be false is if both the remote wan site 
and local wan site are updating the same key at the same time. In that case a 
ConcurrentCacheModificationException can occur in the replicate persistent 
member (the one processing the RemotePutMessage).

This exception is not logged anywhere, and RemotePutMessage operateOnRegion 
doesn't know anything about it.

RemotePutMessage operateOnRegion running in the replicate persistent member 
calls:
{noformat}
result = r.getDataView().putEntry(event, this.ifNew, this.ifOld, 
this.expectedOldValue,
    this.requireOldValue, this.lastModified, true);
{noformat}
If putEntry returns false, it throws a RemoteOperationException which is sent 
back to the caller and causes didDistribute to be false. 
 
The result can be false in the RemotePutMessage operateOnRegion method because 
of a ConcurrentCacheModificationException:
{noformat}
org.apache.geode.internal.cache.versions.ConcurrentCacheModificationException: 
conflicting WAN event detected
        at 
org.apache.geode.internal.cache.entries.AbstractRegionEntry.processGatewayTag(AbstractRegionEntry.java:1924)
        at 
org.apache.geode.internal.cache.entries.AbstractRegionEntry.processVersionTag(AbstractRegionEntry.java:1443)
        at 
org.apache.geode.internal.cache.entries.AbstractOplogDiskRegionEntry.processVersionTag(AbstractOplogDiskRegionEntry.java:165)
        at 
org.apache.geode.internal.cache.entries.VersionedThinDiskLRURegionEntryHeapStringKey1.processVersionTag(VersionedThinDiskLRURegionEntryHeapStringKey1.java:378)
        at 
org.apache.geode.internal.cache.AbstractRegionMap.processVersionTag(AbstractRegionMap.java:527)
        at 
org.apache.geode.internal.cache.map.RegionMapPut.updateEntry(RegionMapPut.java:484)
        at 
org.apache.geode.internal.cache.map.RegionMapPut.createOrUpdateEntry(RegionMapPut.java:256)
        at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutAndDeliverEvent(AbstractRegionMapPut.java:300)
        at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.runWithIndexUpdatingInProgress(AbstractRegionMapPut.java:308)
        at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutIfPreconditionsSatisified(AbstractRegionMapPut.java:296)
        at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutOnSynchronizedRegionEntry(AbstractRegionMapPut.java:282)
        at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutOnRegionEntryInMap(AbstractRegionMapPut.java:273)
        at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.addRegionEntryToMapAndDoPut(AbstractRegionMapPut.java:251)
        at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPutRetryingIfNeeded(AbstractRegionMapPut.java:216)
        at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.doWithIndexInUpdateMode(AbstractRegionMapPut.java:198)
        at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.doPut(AbstractRegionMapPut.java:180)
        at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.runWhileLockedForCacheModification(AbstractRegionMapPut.java:119)
        at 
org.apache.geode.internal.cache.map.RegionMapPut.runWhileLockedForCacheModification(RegionMapPut.java:161)
        at 
org.apache.geode.internal.cache.map.AbstractRegionMapPut.put(AbstractRegionMapPut.java:169)
        at 
org.apache.geode.internal.cache.AbstractRegionMap.basicPut(AbstractRegionMap.java:2047)
        at 
org.apache.geode.internal.cache.LocalRegion.virtualPut(LocalRegion.java:5569)
        at 
org.apache.geode.internal.cache.DistributedRegion.virtualPut(DistributedRegion.java:386)
        at 
org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:162)
        at 
org.apache.geode.internal.cache.tx.RemotePutMessage.operateOnRegion(RemotePutMessage.java:635)
        at 
org.apache.geode.internal.cache.tx.RemoteOperationMessage.process(RemoteOperationMessage.java:195)
{noformat}
This exception is caught in LocalRegion.virtualPut but not logged, so there is 
no evidence of it. LocalRegion.virtualPut just returns false in that case.

So, to the caller, it looks like a persistent replicated is offline when it 
isn't.

A GatewayConflictResolver can help detect this case. If the resolver accepts 
the wan event, then the exceptions do not occur. If the resolver rejects the 
WAN event, then exceptions will occur.

All they really mean is that the wan event was rejected because it was 
conflicting with a local event on the same key.

It would be nice if instead of RemotePutMessage operateOnRegion returning a 
generic RemoteOperationException, an actual 
ConcurrentCacheModificationException could be returned (or at least a 
RemoteOperationException with the ConcurrentCacheModificationException 
message). Short of that, logging the ConcurrentCacheModificationException and 
throwing something other than the PersistentReplicatesOfflineException in 
DistributedRegion virtualPut would be better.






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to