[ 
https://issues.apache.org/jira/browse/GEODE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donal Evans resolved GEODE-7643.
--------------------------------
    Fix Version/s: 1.12.0
       Resolution: Fixed

> Gateway unprocessedTokensMap appears to grow without bounds with replicated 
> regions and peer accessors
> ------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-7643
>                 URL: https://issues.apache.org/jira/browse/GEODE-7643
>             Project: Geode
>          Issue Type: Bug
>          Components: wan
>            Reporter: Donal Evans
>            Assignee: Donal Evans
>            Priority: Major
>             Fix For: 1.12.0
>
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When peer accessors do puts to a replicated region with a serial gateway 
> sender via multiple threads and on the same key, 
> {{ConcurrentCacheModificationException}} in {{LocalRegion.virtualPut}} causes 
> {{notifyGatewaySender}} to be called, which puts the event into the queue. 
> Since the {{AbstractUpdateOperation.doPutOrCreate}} method can potentially 
> call {{LocalRegion.virtualPut}} three times and encounter a 
> {{ConcurrentCacheModificationException}} each time, this can lead to the 
> event being put in the queue twice but only removed once and causing the 
> unprocessedTokensMap to accumulate events.
> Here are the two stacks:
> {noformat}
> [warn 2019/12/02 12:47:59.102 PST <P2P message reader for 
> 10.255.202.119(gateway-ln-2:85182)<v97>:41004 unshared ordered uid=11 dom #2 
> port=59034> tid=0x61] XXX LocalRegion.virtualPut caught 
> ConcurrentCacheModificationException about to notifyGatewaySender 
> eventId=EventID[10.255.202.119(accessor-ln-1)<v98>:41005;threadID=5;sequenceID=273];
>  ifNew=false; ifOld=true; overwriteDestroyed=false; eventIdentity=329453507; 
> eventValue=Trade[id=-1501795011; cusip=PVTL; shares=29; price=163; 
> payloadLength=0 bytes]
> java.lang.Exception
>       at 
> org.apache.geode.internal.cache.LocalRegion.virtualPut(LocalRegion.java:5591)
>       at 
> org.apache.geode.internal.cache.DistributedRegion.virtualPut(DistributedRegion.java:385)
>       at 
> org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:162)
>       at 
> org.apache.geode.internal.cache.LocalRegion.basicUpdate(LocalRegion.java:5561)
>       at 
> org.apache.geode.internal.cache.AbstractUpdateOperation.doPutOrCreate(AbstractUpdateOperation.java:182)
>       at 
> org.apache.geode.internal.cache.AbstractUpdateOperation$AbstractUpdateMessage.basicOperateOnRegion(AbstractUpdateOperation.java:287)
>       at 
> org.apache.geode.internal.cache.AbstractUpdateOperation$AbstractUpdateMessage.operateOnRegion(AbstractUpdateOperation.java:258)
>       at 
> org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.basicProcess(DistributedCacheOperation.java:1206)
>       at 
> org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.process(DistributedCacheOperation.java:1108)
>       at 
> org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:372)
>       at 
> org.apache.geode.distributed.internal.DistributionMessage.schedule(DistributionMessage.java:427)
> {noformat}
> {noformat}
> [warn 2019/12/02 12:47:59.108 PST <P2P message reader for 
> 10.255.202.119(gateway-ln-2:85182)<v97>:41004 unshared ordered uid=11 dom #2 
> port=59034> tid=0x61] XXX LocalRegion.virtualPut caught 
> ConcurrentCacheModificationException about to notifyGatewaySender 
> eventId=EventID[10.255.202.119(accessor-ln-1)<v98>:41005;threadID=5;sequenceID=273];
>  ifNew=false; ifOld=false; overwriteDestroyed=true; eventIdentity=329453507; 
> eventValue=Trade[id=-1501795011; cusip=PVTL; shares=29; price=163; 
> payloadLength=0 bytes]
> java.lang.Exception
>       at 
> org.apache.geode.internal.cache.LocalRegion.virtualPut(LocalRegion.java:5591)
>       at 
> org.apache.geode.internal.cache.DistributedRegion.virtualPut(DistributedRegion.java:385)
>       at 
> org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:162)
>       at 
> org.apache.geode.internal.cache.LocalRegion.basicUpdate(LocalRegion.java:5561)
>       at 
> org.apache.geode.internal.cache.AbstractUpdateOperation.doPutOrCreate(AbstractUpdateOperation.java:194)
>       at 
> org.apache.geode.internal.cache.AbstractUpdateOperation$AbstractUpdateMessage.basicOperateOnRegion(AbstractUpdateOperation.java:287)
>       at 
> org.apache.geode.internal.cache.AbstractUpdateOperation$AbstractUpdateMessage.operateOnRegion(AbstractUpdateOperation.java:258)
>       at 
> org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.basicProcess(DistributedCacheOperation.java:1206)
>       at 
> org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.process(DistributedCacheOperation.java:1108)
> {noformat}
> Here are the corresponding puts into the queue:
> {noformat}
> [warn 2019/12/02 12:47:59.104 PST <P2P message reader for 
> 10.255.202.119(gateway-ln-2:85182)<v97>:41004 unshared ordered uid=11 dom #2 
> port=59034> tid=0x61] XXX SerialGatewaySenderQueue.putAndGetKey key=3625; 
> eventId=EventID[10.255.202.119(accessor-ln-1)<v98>:41005;threadID=0x30002|5;sequenceID=273];
>  eventValue=Trade[id=-1501795011; cusip=PVTL; shares=29; 
> price=163.08897399902344; payloadLength=0 bytes]
> {noformat}
> {noformat}
> [warn 2019/12/02 12:47:59.110 PST <P2P message reader for 
> 10.255.202.119(gateway-ln-2:85182)<v97>:41004 unshared ordered uid=11 dom #2 
> port=59034> tid=0x61] XXX SerialGatewaySenderQueue.putAndGetKey key=3635; 
> eventId=EventID[10.255.202.119(accessor-ln-1)<v98>:41005;threadID=0x30002|5;sequenceID=273];
>  eventValue=Trade[id=-1501795011; cusip=PVTL; shares=29; 
> price=163.08897399902344; payloadLength=0 bytes]
> {noformat}
> On the secondary, when the event is received via normal replication, its 
> added to the unprocessedEvents map:
> {noformat}
> [warn 2019/12/02 12:47:59.100 PST <P2P message reader for 
> 10.255.202.119(accessor-ln-1:85194)<v98>:41005 unshared ordered uid=13 dom #1 
> port=59022> tid=0x58] 
> SerialGatewaySenderEventProcessor.basicHandleSecondaryEvent put 
> unprocessedEvents 
> eventId=EventID[10.255.202.119(accessor-ln-1)<v98>:41005;threadID=0x30002|5;sequenceID=273]
> {noformat}
> The first replication from the primary queue is received which removes the 
> event from the unprocessedEvents map:
> {noformat}
> [warn 2019/12/02 12:47:59.104 PST <P2P message reader for 
> 10.255.202.119(gateway-ln-1:85170)<v96>:41003 unshared ordered uid=18 dom #3 
> port=59052> tid=0x68] XXX SerialSecondaryGatewayListener.afterCreate 
> senderEvent=EventID[10.255.202.119(accessor-ln-1)<v98>:41005;threadID=0x30002|5;sequenceID=273]
> [warn 2019/12/02 12:47:59.104 PST <Queued Gateway Listener Thread1> tid=0x5e] 
> SerialGatewaySenderEventProcessor.basicHandlePrimaryEvent removed 
> unprocessedEvents 
> eventId=EventID[10.255.202.119(accessor-ln-1)<v98>:41005;threadID=0x30002|5;sequenceID=273];
>  
> value=org.apache.geode.internal.cache.wan.AbstractGatewaySender$EventWrapper@3f6df03b
> {noformat}
> Then the second replication from the primary queue is received which 
> incorrectly adds the event to the unprocessedTokens map where is stays 
> forever:
> {noformat}
> [warn 2019/12/02 12:47:59.110 PST <P2P message reader for 
> 10.255.202.119(gateway-ln-1:85170)<v96>:41003 unshared ordered uid=18 dom #3 
> port=59052> tid=0x68] XXX SerialSecondaryGatewayListener.afterCreate 
> senderEvent=EventID[10.255.202.119(accessor-ln-1)<v98>:41005;threadID=0x30002|5;sequenceID=273]
> [warn 2019/12/02 12:47:59.110 PST <Queued Gateway Listener Thread1> tid=0x5e] 
> SerialGatewaySenderEventProcessor.basicHandlePrimaryEvent put 
> unprocessedTokens 
> eventId=EventID[10.255.202.119(accessor-ln-1)<v98>:41005;threadID=0x30002|5;sequenceID=273];
>  value=1575319799110; size=914 {noformat}
>  
> The proposed solution to this issue is to add two boolean arguments to the 
> {{LocalRegion.virtualPut}} method, one to control if a 
> {{ConcurrentCacheModificationException}} should result in notifying the 
> bridge clients and gateway senders, and another to control if any 
> {{ConcurrentCacheModificationException}} encountered should be thrown or 
> suppressed. These arguments allow the 
> {{AbstractUpdateOperation.doPutOrCreate}} method to 1. prevent subsequent 
> calls to {{LocalRegion.virtualPut}} following a 
> {{ConcurrentCacheModificationException}} from notifying the gateway sender, 
> and 2. know whether or not the {{LocalRegion.virtualPut}} method failed 
> specifically due to a {{ConcurrentCacheModificationException}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to