[jira] [Commented] (GEODE-6859) Destroying a parallel gateway sender attached to a region causes other senders attached to that same region to no longer queue events

Barry Oglesby (JIRA) Thu, 13 Jun 2019 11:20:30 -0700


    [ 
https://issues.apache.org/jira/browse/GEODE-6859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863350#comment-16863350
 ]


Barry Oglesby commented on GEODE-6859:
--------------------------------------

Here is some additional logging showing the behavior:

The shadow PR for GatewaySender mysender is created:
{noformat}
[warn 2019/06/13 10:24:36.546 PDT <Function Execution Processor2> tid=0x3a] XXX 
ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR 
senderId=mysender; userPR=/test
[warn 2019/06/13 10:24:36.546 PDT <Function Execution Processor2> tid=0x3a] XXX 
ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR 
senderId=mysender; prQName=mysender_PARALLEL_GATEWAY_SENDER_QUEUE; prQ=null
[warn 2019/06/13 10:24:36.597 PDT <Function Execution Processor2> tid=0x3a] XXX 
ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR created queue 
senderId=mysender; prQName=mysender_PARALLEL_GATEWAY_SENDER_QUEUE; 
prQ=Partitioned Region @7951061f 
[path='/mysender_PARALLEL_GATEWAY_SENDER_QUEUE'; dataPolicy=PARTITION; prId=2; 
isDestroyed=false; isClosed=false; retryTimeout=3600000; serialNumber=125; 
partition 
attributes=PartitionAttributes@639507262[redundantCopies=0;localMaxMemory=100;totalMaxMemory=2147483647;totalNumBuckets=113;partitionResolver=null;colocatedWith=/test;recoveryDelay=-1;startupRecoveryDelay=0;FixedPartitionAttributes=null;partitionListeners=null];
 on VM 192.168.1.2(server:4637)<v1>:41001]
{noformat}
The shadow PR for GatewaySender mysender2 is created:
{noformat}
[warn 2019/06/13 10:24:43.064 PDT <Function Execution Processor2> tid=0x3a] XXX 
ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR 
senderId=mysender2; userPR=/test
[warn 2019/06/13 10:24:43.064 PDT <Function Execution Processor2> tid=0x3a] XXX 
ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR 
senderId=mysender2; prQName=mysender2_PARALLEL_GATEWAY_SENDER_QUEUE; prQ=null
[warn 2019/06/13 10:24:43.069 PDT <Function Execution Processor2> tid=0x3a] XXX 
ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR created queue 
senderId=mysender2; prQName=mysender2_PARALLEL_GATEWAY_SENDER_QUEUE; 
prQ=Partitioned Region @1c5b3979 
[path='/mysender2_PARALLEL_GATEWAY_SENDER_QUEUE'; dataPolicy=PARTITION; prId=3; 
isDestroyed=false; isClosed=false; retryTimeout=3600000; serialNumber=466; 
partition 
attributes=PartitionAttributes@635010394[redundantCopies=0;localMaxMemory=100;totalMaxMemory=2147483647;totalNumBuckets=113;partitionResolver=null;colocatedWith=/test;recoveryDelay=-1;startupRecoveryDelay=0;FixedPartitionAttributes=null;partitionListeners=null];
 on VM 192.168.1.2(server:4637)<v1>:41001]
{noformat}
GatewaySender mysender is destroyed:
{noformat}
[warn 2019/06/13 10:24:43.889 PDT <Function Execution Processor2> tid=0x3a] XXX 
AbstractGatewaySender.destroy region=/mysender_PARALLEL_GATEWAY_SENDER_QUEUE
[warn 2019/06/13 10:24:43.889 PDT <Function Execution Processor2> tid=0x3a] XXX 
PartitionedRegion.destroyRegion region=/mysender_PARALLEL_GATEWAY_SENDER_QUEUE
{noformat}
That causes PartitionedRegionDataStore.cleanUp to set shadowBucketDestroyed to 
true for all the buckets of the test region:
{noformat}
[warn 2019/06/13 10:24:43.890 PDT <Function Execution Processor2> tid=0x3a] XXX 
PartitionedRegionDataStore.cleanUp 
region=/mysender_PARALLEL_GATEWAY_SENDER_QUEUE
[warn 2019/06/13 10:24:43.895 PDT <Function Execution Processor2> tid=0x3a] XXX 
BucketAdvisor.setShadowBucketDestroyed region=/test; bucket=0; destroyed=true
[warn 2019/06/13 10:24:43.896 PDT <Function Execution Processor2> tid=0x3a] XXX 
BucketAdvisor.setShadowBucketDestroyed region=/test; bucket=1; destroyed=true
[warn 2019/06/13 10:24:43.897 PDT <Function Execution Processor2> tid=0x3a] XXX 
BucketAdvisor.setShadowBucketDestroyed region=/test; bucket=2; destroyed=true
[warn 2019/06/13 10:24:43.898 PDT <Function Execution Processor2> tid=0x3a] XXX 
BucketAdvisor.setShadowBucketDestroyed region=/test; bucket=3; destroyed=true
[warn 2019/06/13 10:24:43.899 PDT <Function Execution Processor2> tid=0x3a] XXX 
BucketAdvisor.setShadowBucketDestroyed region=/test; bucket=4; destroyed=true
[warn 2019/06/13 10:24:43.899 PDT <Function Execution Processor2> tid=0x3a] ...
[warn 2019/06/13 10:24:43.942 PDT <Function Execution Processor2> tid=0x3a] XXX 
BucketAdvisor.setShadowBucketDestroyed region=/test; bucket=51; destroyed=true
[warn 2019/06/13 10:24:43.942 PDT <Function Execution Processor2> tid=0x3a] ...
[warn 2019/06/13 10:24:43.959 PDT <Function Execution Processor2> tid=0x3a] XXX 
PartitionedRegionDataStore.cleanUp complete 
region=/mysender_PARALLEL_GATEWAY_SENDER_QUEUE
{noformat}
The put is delivered to the ParallelGatewaySenderQueue, but 
shadowBucketDestroyed is true from the cleanUp above so the put is dropped:
{noformat}
[warn 2019/06/13 10:24:44.011 PDT <Function Execution Processor2> tid=0x3a] XXX 
ParallelGatewaySenderQueue.put 
brq=/__PR/_B__mysender2__PARALLEL__GATEWAY__SENDER__QUEUE_51
[warn 2019/06/13 10:24:44.012 PDT <Function Execution Processor2> tid=0x3a] XXX 
ParallelGatewaySenderQueue.put 
brq=/__PR/_B__mysender2__PARALLEL__GATEWAY__SENDER__QUEUE_51; 
shadowBucketDestroyed=true
[warn 2019/06/13 10:24:44.012 PDT <Function Execution Processor2> tid=0x3a] XXX 
ParallelGatewaySenderQueue.put not putting entry into queue as shadowPR bucket 
is destroyed: key=164; value=GatewaySenderEventImpl[id=EventID[id=24 
bytes;threadID=0x1010033|1;sequenceID=122;bucketId=51];action=0;operation=CREATE;region=/test;key=3;value=3;valueIsObject=1;numberOfParts=9;callbackArgument=GatewaySenderEventCallbackArgument
 
[originalCallbackArg=null;originatingSenderId=-1;recipientGatewayReceivers=\{4}];possibleDuplicate=false;creationTime=1560446684011;shadowKey=164;timeStamp=1560446684010;acked=false;dispatched=false;bucketId=51;isConcurrencyConflict=false]
{noformat}

> Destroying a parallel gateway sender attached to a region causes other 
> senders attached to that same region to no longer queue events
> -------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-6859
>                 URL: https://issues.apache.org/jira/browse/GEODE-6859
>             Project: Geode
>          Issue Type: Bug
>          Components: wan
>            Reporter: Barry Oglesby
>            Priority: Major
>
> This scenario causes the event to not be put into the queue:
>  - create gateway sender sender1
>  - create region attached to sender1
>  - create gateway sender sender2
>  - alter region to be attached to sender2
>  - destroy sender1
>  - put an entry into region
> Here are the steps using gfsh:
> {noformat}
> gfsh>create gateway-sender --id=mysender --remote-distributed-system-id=2 
> --enable-persistence --parallel
> gfsh>create region --name=test --gateway-sender-id=mysender 
> --type=PARTITION_PERSISTENT
> gfsh>create gateway-sender --id=mysender2 --remote-distributed-system-id=4 
> --enable-persistence --parallel
> gfsh>alter region --name=test --gateway-sender-id=mysender2
> gfsh>destroy gateway-sender --id=mysender
> gfsh>put --region=test --key="3" --value="3"
> {noformat}
> Debug logging shows:
> {noformat}
> [debug 2019/06/11 17:45:03.678 PDT <Function Execution Processor2> tid=0x3a] 
> ParallelGatewaySenderOrderedQueue not putting key 164 : Value : 
> GatewaySenderEventImpl[id=EventID[192.168.1.2(server)<v1>:41001;threadID=0x1010033|2;sequenceID=125;bucketID=51];action=0;operation=CREATE;region=/test;key=3;value=3;...]
>  as shadowPR bucket is destroyed.
> {noformat}
> It comes down to this call in ParallelGatewaySenderQueue.put:
> {noformat}
> thisbucketDestroyed =
>  ((PartitionedRegion) prQ.getColocatedWithRegion()).getRegionAdvisor()
>  .getBucketAdvisor(bucketId).getShadowBucketDestroyed() || brq.isDestroyed();
> {noformat}
> The first condition is true.
> Here is a stack that shows where shadowBucketDestroyed is set to true:
> {noformat}
> [warn 2019/06/12 16:32:47.066 PDT <Function Execution Processor2> tid=0x3a] 
> XXX BucketAdvisor.setShadowBucketDestroyed destroyed=true
> java.lang.Exception
>  at 
> org.apache.geode.internal.cache.BucketAdvisor.setShadowBucketDestroyed(BucketAdvisor.java:2820)
>  at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.cleanUp(PartitionedRegionDataStore.java:1417)
>  at 
> org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegionLocally(PartitionedRegion.java:7520)
>  at 
> org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegionGlobally(PartitionedRegion.java:7376)
>  at 
> org.apache.geode.internal.cache.PartitionedRegion.destroyPartitionedRegion(PartitionedRegion.java:7301)
>  at 
> org.apache.geode.internal.cache.PartitionedRegion.postDestroyRegion(PartitionedRegion.java:7630)
>  at 
> org.apache.geode.internal.cache.LocalRegion.recursiveDestroyRegion(LocalRegion.java:2732)
>  at 
> org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6299)
>  at 
> org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6251)
>  at 
> org.apache.geode.internal.cache.PartitionedRegion.destroyRegion(PartitionedRegion.java:7077)
>  at 
> org.apache.geode.internal.cache.AbstractRegion.destroyRegion(AbstractRegion.java:453)
>  at 
> org.apache.geode.internal.cache.wan.AbstractGatewaySender.destroy(AbstractGatewaySender.java:599)
>  at 
> org.apache.geode.internal.cache.wan.AbstractGatewaySender.destroy(AbstractGatewaySender.java:555)
>  at 
> org.apache.geode.management.internal.cli.functions.GatewaySenderDestroyFunction.execute(GatewaySenderDestroyFunction.java:60)
>  at 
> org.apache.geode.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:193)
>  at 
> org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:369)
>  at 
> org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:435)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:960)
>  at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.doFunctionExecutionThread(ClusterDistributionManager.java:814)
>  at 
> org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121)
>  at java.lang.Thread.run(Thread.java:745)
> {noformat}
> PartitionedRegionDataStore.cleanUp is doing this:
> {noformat}
> // Fix for defect #49012
> if (buk instanceof AbstractBucketRegionQueue
>  && buk.getPartitionedRegion().isShadowPR()) {
>  if (buk.getPartitionedRegion().getColocatedWithRegion() != null) {
>  buk.getPartitionedRegion().getColocatedWithRegion().getRegionAdvisor()
>  .getBucketAdvisor(bucketId).setShadowBucketDestroyed(true);
>  }
> }
> {noformat}
> The \{{buk.getPartitionedRegion().getColocatedWithRegion()}} is the data 
> region. It can have more than one shadow region.
> So, either this code has to check whether there are other shadow regions 
> before making the call to setShadowBucketDestroyed or the BucketAdvisor 
> shadowBucketDestroyed has to be maintained per shadow region rather than be a 
> single boolean.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (GEODE-6859) Destroying a parallel gateway sender attached to a region causes other senders attached to that same region to no longer queue events

Reply via email to