[
https://issues.apache.org/jira/browse/GEODE-6859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863350#comment-16863350
]
Barry Oglesby commented on GEODE-6859:
--------------------------------------
Here is some additional logging showing the behavior:
The shadow PR for GatewaySender mysender is created:
{noformat}
[warn 2019/06/13 10:24:36.546 PDT <Function Execution Processor2> tid=0x3a] XXX
ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR
senderId=mysender; userPR=/test
[warn 2019/06/13 10:24:36.546 PDT <Function Execution Processor2> tid=0x3a] XXX
ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR
senderId=mysender; prQName=mysender_PARALLEL_GATEWAY_SENDER_QUEUE; prQ=null
[warn 2019/06/13 10:24:36.597 PDT <Function Execution Processor2> tid=0x3a] XXX
ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR created queue
senderId=mysender; prQName=mysender_PARALLEL_GATEWAY_SENDER_QUEUE;
prQ=Partitioned Region @7951061f
[path='/mysender_PARALLEL_GATEWAY_SENDER_QUEUE'; dataPolicy=PARTITION; prId=2;
isDestroyed=false; isClosed=false; retryTimeout=3600000; serialNumber=125;
partition
attributes=PartitionAttributes@639507262[redundantCopies=0;localMaxMemory=100;totalMaxMemory=2147483647;totalNumBuckets=113;partitionResolver=null;colocatedWith=/test;recoveryDelay=-1;startupRecoveryDelay=0;FixedPartitionAttributes=null;partitionListeners=null];
on VM 192.168.1.2(server:4637)<v1>:41001]
{noformat}
The shadow PR for GatewaySender mysender2 is created:
{noformat}
[warn 2019/06/13 10:24:43.064 PDT <Function Execution Processor2> tid=0x3a] XXX
ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR
senderId=mysender2; userPR=/test
[warn 2019/06/13 10:24:43.064 PDT <Function Execution Processor2> tid=0x3a] XXX
ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR
senderId=mysender2; prQName=mysender2_PARALLEL_GATEWAY_SENDER_QUEUE; prQ=null
[warn 2019/06/13 10:24:43.069 PDT <Function Execution Processor2> tid=0x3a] XXX
ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR created queue
senderId=mysender2; prQName=mysender2_PARALLEL_GATEWAY_SENDER_QUEUE;
prQ=Partitioned Region @1c5b3979
[path='/mysender2_PARALLEL_GATEWAY_SENDER_QUEUE'; dataPolicy=PARTITION; prId=3;
isDestroyed=false; isClosed=false; retryTimeout=3600000; serialNumber=466;
partition
attributes=PartitionAttributes@635010394[redundantCopies=0;localMaxMemory=100;totalMaxMemory=2147483647;totalNumBuckets=113;partitionResolver=null;colocatedWith=/test;recoveryDelay=-1;startupRecoveryDelay=0;FixedPartitionAttributes=null;partitionListeners=null];
on VM 192.168.1.2(server:4637)<v1>:41001]
{noformat}
GatewaySender mysender is destroyed:
{noformat}
[warn 2019/06/13 10:24:43.889 PDT <Function Execution Processor2> tid=0x3a] XXX
AbstractGatewaySender.destroy region=/mysender_PARALLEL_GATEWAY_SENDER_QUEUE
[warn 2019/06/13 10:24:43.889 PDT <Function Execution Processor2> tid=0x3a] XXX
PartitionedRegion.destroyRegion region=/mysender_PARALLEL_GATEWAY_SENDER_QUEUE
{noformat}
That causes PartitionedRegionDataStore.cleanUp to set shadowBucketDestroyed to
true for all the buckets of the test region:
{noformat}
[warn 2019/06/13 10:24:43.890 PDT <Function Execution Processor2> tid=0x3a] XXX
PartitionedRegionDataStore.cleanUp
region=/mysender_PARALLEL_GATEWAY_SENDER_QUEUE
[warn 2019/06/13 10:24:43.895 PDT <Function Execution Processor2> tid=0x3a] XXX
BucketAdvisor.setShadowBucketDestroyed region=/test; bucket=0; destroyed=true
[warn 2019/06/13 10:24:43.896 PDT <Function Execution Processor2> tid=0x3a] XXX
BucketAdvisor.setShadowBucketDestroyed region=/test; bucket=1; destroyed=true
[warn 2019/06/13 10:24:43.897 PDT <Function Execution Processor2> tid=0x3a] XXX
BucketAdvisor.setShadowBucketDestroyed region=/test; bucket=2; destroyed=true
[warn 2019/06/13 10:24:43.898 PDT <Function Execution Processor2> tid=0x3a] XXX
BucketAdvisor.setShadowBucketDestroyed region=/test; bucket=3; destroyed=true
[warn 2019/06/13 10:24:43.899 PDT <Function Execution Processor2> tid=0x3a] XXX
BucketAdvisor.setShadowBucketDestroyed region=/test; bucket=4; destroyed=true
[warn 2019/06/13 10:24:43.899 PDT <Function Execution Processor2> tid=0x3a] ...
[warn 2019/06/13 10:24:43.942 PDT <Function Execution Processor2> tid=0x3a] XXX
BucketAdvisor.setShadowBucketDestroyed region=/test; bucket=51; destroyed=true
[warn 2019/06/13 10:24:43.942 PDT <Function Execution Processor2> tid=0x3a] ...
[warn 2019/06/13 10:24:43.959 PDT <Function Execution Processor2> tid=0x3a] XXX
PartitionedRegionDataStore.cleanUp complete
region=/mysender_PARALLEL_GATEWAY_SENDER_QUEUE
{noformat}
The put is delivered to the ParallelGatewaySenderQueue, but
shadowBucketDestroyed is true from the cleanUp above so the put is dropped:
{noformat}
[warn 2019/06/13 10:24:44.011 PDT <Function Execution Processor2> tid=0x3a] XXX
ParallelGatewaySenderQueue.put
brq=/__PR/_B__mysender2__PARALLEL__GATEWAY__SENDER__QUEUE_51
[warn 2019/06/13 10:24:44.012 PDT <Function Execution Processor2> tid=0x3a] XXX
ParallelGatewaySenderQueue.put
brq=/__PR/_B__mysender2__PARALLEL__GATEWAY__SENDER__QUEUE_51;
shadowBucketDestroyed=true
[warn 2019/06/13 10:24:44.012 PDT <Function Execution Processor2> tid=0x3a] XXX
ParallelGatewaySenderQueue.put not putting entry into queue as shadowPR bucket
is destroyed: key=164; value=GatewaySenderEventImpl[id=EventID[id=24
bytes;threadID=0x1010033|1;sequenceID=122;bucketId=51];action=0;operation=CREATE;region=/test;key=3;value=3;valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument
[originalCallbackArg=null;originatingSenderId=-1;recipientGatewayReceivers=\{4}];possibleDuplicate=false;creationTime=1560446684011;shadowKey=164;timeStamp=1560446684010;acked=false;dispatched=false;bucketId=51;isConcurrencyConflict=false]
{noformat}
> Destroying a parallel gateway sender attached to a region causes other
> senders attached to that same region to no longer queue events
> -------------------------------------------------------------------------------------------------------------------------------------
>
> Key: GEODE-6859
> URL: https://issues.apache.org/jira/browse/GEODE-6859
> Project: Geode
> Issue Type: Bug
> Components: wan
> Reporter: Barry Oglesby
> Priority: Major
>
> This scenario causes the event to not be put into the queue:
> - create gateway sender sender1
> - create region attached to sender1
> - create gateway sender sender2
> - alter region to be attached to sender2
> - destroy sender1
> - put an entry into region
> Here are the steps using gfsh:
> {noformat}
> gfsh>create gateway-sender --id=mysender --remote-distributed-system-id=2
> --enable-persistence --parallel
> gfsh>create region --name=test --gateway-sender-id=mysender
> --type=PARTITION_PERSISTENT
> gfsh>create gateway-sender --id=mysender2 --remote-distributed-system-id=4
> --enable-persistence --parallel
> gfsh>alter region --name=test --gateway-sender-id=mysender2
> gfsh>destroy gateway-sender --id=mysender
> gfsh>put --region=test --key="3" --value="3"
> {noformat}
> Debug logging shows:
> {noformat}
> [debug 2019/06/11 17:45:03.678 PDT <Function Execution Processor2> tid=0x3a]
> ParallelGatewaySenderOrderedQueue not putting key 164 : Value :
> GatewaySenderEventImpl[id=EventID[192.168.1.2(server)<v1>:41001;threadID=0x1010033|2;sequenceID=125;bucketID=51];action=0;operation=CREATE;region=/test;key=3;value=3;...]
> as shadowPR bucket is destroyed.
> {noformat}
> It comes down to this call in ParallelGatewaySenderQueue.put:
> {noformat}
> thisbucketDestroyed =
> ((PartitionedRegion) prQ.getColocatedWithRegion()).getRegionAdvisor()
> .getBucketAdvisor(bucketId).getShadowBucketDestroyed() || brq.isDestroyed();
> {noformat}
> The first condition is true.
> Here is a stack that shows where shadowBucketDestroyed is set to true:
> {noformat}
> [warn 2019/06/12 16:32:47.066 PDT <Function Execution Processor2> tid=0x3a]
> XXX BucketAdvisor.setShadowBucketDestroyed destroyed=true
> java.lang.Exception
> at
> org.apache.geode.internal.cache.BucketAdvisor.setShadowBucketDestroyed(BucketAdvisor.java:2820)
> at
> org.apache.geode.internal.cache.PartitionedRegionDataStore.cleanUp(PartitionedRegionDataStore.java:1417)
> at
> org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegionLocally(PartitionedRegion.java:7520)
> at
> org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegionGlobally(PartitionedRegion.java:7376)
> at
> org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegion(PartitionedRegion.java:7301)
> at
> org.apache.geode.internal.cache.PartitionedRegion.postDestroyRegion(PartitionedRegion.java:7630)
> at
> org.apache.geode.internal.cache.LocalRegion.recursiveDestroyRegion(LocalRegion.java:2732)
> at
> org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6299)
> at
> org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6251)
> at
> org.apache.geode.internal.cache.PartitionedRegion.destroyRegion(PartitionedRegion.java:7077)
> at
> org.apache.geode.internal.cache.AbstractRegion.destroyRegion(AbstractRegion.java:453)
> at
> org.apache.geode.internal.cache.wan.AbstractGatewaySender.destroy(AbstractGatewaySender.java:599)
> at
> org.apache.geode.internal.cache.wan.AbstractGatewaySender.destroy(AbstractGatewaySender.java:555)
> at
> org.apache.geode.management.internal.cli.functions.GatewaySenderDestroyFunction.execute(GatewaySenderDestroyFunction.java:60)
> at
> org.apache.geode.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:193)
> at
> org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:369)
> at
> org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:435)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at
> org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:960)
> at
> org.apache.geode.distributed.internal.ClusterDistributionManager.doFunctionExecutionThread(ClusterDistributionManager.java:814)
> at
> org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> PartitionedRegionDataStore.cleanUp is doing this:
> {noformat}
> // Fix for defect #49012
> if (buk instanceof AbstractBucketRegionQueue
> && buk.getPartitionedRegion().isShadowPR()) {
> if (buk.getPartitionedRegion().getColocatedWithRegion() != null) {
> buk.getPartitionedRegion().getColocatedWithRegion().getRegionAdvisor()
> .getBucketAdvisor(bucketId).setShadowBucketDestroyed(true);
> }
> }
> {noformat}
> The \{{buk.getPartitionedRegion().getColocatedWithRegion()}} is the data
> region. It can have more than one shadow region.
> So, either this code has to check whether there are other shadow regions
> before making the call to setShadowBucketDestroyed or the BucketAdvisor
> shadowBucketDestroyed has to be maintained per shadow region rather than be a
> single boolean.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)