Donal Evans created GEODE-7643:
----------------------------------
Summary: Gateway unprocessedTokensMap appears to grow without
bounds with replicated regions and peer accessors
Key: GEODE-7643
URL: https://issues.apache.org/jira/browse/GEODE-7643
Project: Geode
Issue Type: Bug
Components: wan
Reporter: Donal Evans
When peer accessors do puts to a replicated region with a serial gateway sender
via multiple threads and on the same key,
{{ConcurrentCacheModificationException}} in {{LocalRegion.virtualPut}} causes
{{notifyGatewaySender}} to be called, which puts the event into the queue,
leading to the event being put in the queue twice but only removed once and
causing the unprocessedTokensMap to accumulate events.
Here are the two stacks:
{noformat}
[warn 2019/12/02 12:47:59.102 PST <P2P message reader for
10.255.202.119(gateway-ln-2:85182)<v97>:41004 unshared ordered uid=11 dom #2
port=59034> tid=0x61] XXX LocalRegion.virtualPut caught
ConcurrentCacheModificationException about to notifyGatewaySender
eventId=EventID[10.255.202.119(accessor-ln-1)<v98>:41005;threadID=5;sequenceID=273];
ifNew=false; ifOld=true; overwriteDestroyed=false; eventIdentity=329453507;
eventValue=Trade[id=-1501795011; cusip=PVTL; shares=29; price=163;
payloadLength=0 bytes]
java.lang.Exception
at
org.apache.geode.internal.cache.LocalRegion.virtualPut(LocalRegion.java:5591)
at
org.apache.geode.internal.cache.DistributedRegion.virtualPut(DistributedRegion.java:385)
at
org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:162)
at
org.apache.geode.internal.cache.LocalRegion.basicUpdate(LocalRegion.java:5561)
at
org.apache.geode.internal.cache.AbstractUpdateOperation.doPutOrCreate(AbstractUpdateOperation.java:182)
at
org.apache.geode.internal.cache.AbstractUpdateOperation$AbstractUpdateMessage.basicOperateOnRegion(AbstractUpdateOperation.java:287)
at
org.apache.geode.internal.cache.AbstractUpdateOperation$AbstractUpdateMessage.operateOnRegion(AbstractUpdateOperation.java:258)
at
org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.basicProcess(DistributedCacheOperation.java:1206)
at
org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.process(DistributedCacheOperation.java:1108)
at
org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:372)
at
org.apache.geode.distributed.internal.DistributionMessage.schedule(DistributionMessage.java:427)
{noformat}
{noformat}
[warn 2019/12/02 12:47:59.108 PST <P2P message reader for
10.255.202.119(gateway-ln-2:85182)<v97>:41004 unshared ordered uid=11 dom #2
port=59034> tid=0x61] XXX LocalRegion.virtualPut caught
ConcurrentCacheModificationException about to notifyGatewaySender
eventId=EventID[10.255.202.119(accessor-ln-1)<v98>:41005;threadID=5;sequenceID=273];
ifNew=false; ifOld=false; overwriteDestroyed=true; eventIdentity=329453507;
eventValue=Trade[id=-1501795011; cusip=PVTL; shares=29; price=163;
payloadLength=0 bytes]
java.lang.Exception
at
org.apache.geode.internal.cache.LocalRegion.virtualPut(LocalRegion.java:5591)
at
org.apache.geode.internal.cache.DistributedRegion.virtualPut(DistributedRegion.java:385)
at
org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:162)
at
org.apache.geode.internal.cache.LocalRegion.basicUpdate(LocalRegion.java:5561)
at
org.apache.geode.internal.cache.AbstractUpdateOperation.doPutOrCreate(AbstractUpdateOperation.java:194)
at
org.apache.geode.internal.cache.AbstractUpdateOperation$AbstractUpdateMessage.basicOperateOnRegion(AbstractUpdateOperation.java:287)
at
org.apache.geode.internal.cache.AbstractUpdateOperation$AbstractUpdateMessage.operateOnRegion(AbstractUpdateOperation.java:258)
at
org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.basicProcess(DistributedCacheOperation.java:1206)
at
org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.process(DistributedCacheOperation.java:1108)
{noformat}
Here are the corresponding puts into the queue:
{noformat}
[warn 2019/12/02 12:47:59.104 PST <P2P message reader for
10.255.202.119(gateway-ln-2:85182)<v97>:41004 unshared ordered uid=11 dom #2
port=59034> tid=0x61] XXX SerialGatewaySenderQueue.putAndGetKey key=3625;
eventId=EventID[10.255.202.119(accessor-ln-1)<v98>:41005;threadID=0x30002|5;sequenceID=273];
eventValue=Trade[id=-1501795011; cusip=PVTL; shares=29;
price=163.08897399902344; payloadLength=0 bytes]
{noformat}
{noformat}
[warn 2019/12/02 12:47:59.110 PST <P2P message reader for
10.255.202.119(gateway-ln-2:85182)<v97>:41004 unshared ordered uid=11 dom #2
port=59034> tid=0x61] XXX SerialGatewaySenderQueue.putAndGetKey key=3635;
eventId=EventID[10.255.202.119(accessor-ln-1)<v98>:41005;threadID=0x30002|5;sequenceID=273];
eventValue=Trade[id=-1501795011; cusip=PVTL; shares=29;
price=163.08897399902344; payloadLength=0 bytes]
{noformat}
On the secondary, when the event is received via normal replication, its added
to the unprocessedEvents map:
{noformat}
[warn 2019/12/02 12:47:59.100 PST <P2P message reader for
10.255.202.119(accessor-ln-1:85194)<v98>:41005 unshared ordered uid=13 dom #1
port=59022> tid=0x58]
SerialGatewaySenderEventProcessor.basicHandleSecondaryEvent put
unprocessedEvents
eventId=EventID[10.255.202.119(accessor-ln-1)<v98>:41005;threadID=0x30002|5;sequenceID=273]
{noformat}
The first replication from the primary queue is received which removes the
event from the unprocessedEvents map:
{noformat}
[warn 2019/12/02 12:47:59.104 PST <P2P message reader for
10.255.202.119(gateway-ln-1:85170)<v96>:41003 unshared ordered uid=18 dom #3
port=59052> tid=0x68] XXX SerialSecondaryGatewayListener.afterCreate
senderEvent=EventID[10.255.202.119(accessor-ln-1)<v98>:41005;threadID=0x30002|5;sequenceID=273]
[warn 2019/12/02 12:47:59.104 PST <Queued Gateway Listener Thread1> tid=0x5e]
SerialGatewaySenderEventProcessor.basicHandlePrimaryEvent removed
unprocessedEvents
eventId=EventID[10.255.202.119(accessor-ln-1)<v98>:41005;threadID=0x30002|5;sequenceID=273];
value=org.apache.geode.internal.cache.wan.AbstractGatewaySender$EventWrapper@3f6df03b
{noformat}
Then the second replication from the primary queue is received which
incorrectly adds the event to the unprocessedTokens map where is stays forever:
{noformat}
[warn 2019/12/02 12:47:59.110 PST <P2P message reader for
10.255.202.119(gateway-ln-1:85170)<v96>:41003 unshared ordered uid=18 dom #3
port=59052> tid=0x68] XXX SerialSecondaryGatewayListener.afterCreate
senderEvent=EventID[10.255.202.119(accessor-ln-1)<v98>:41005;threadID=0x30002|5;sequenceID=273]
[warn 2019/12/02 12:47:59.110 PST <Queued Gateway Listener Thread1> tid=0x5e]
SerialGatewaySenderEventProcessor.basicHandlePrimaryEvent put unprocessedTokens
eventId=EventID[10.255.202.119(accessor-ln-1)<v98>:41005;threadID=0x30002|5;sequenceID=273];
value=1575319799110; size=914
{noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)