[ 
https://issues.apache.org/jira/browse/GEODE-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk Lund updated GEODE-7082:
-----------------------------
    Description: 
[https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/981]

The run of the distributed test for java 8 failed because a timeout was 
exceeded.

I've attached the dunit-hangs.txt and callstack files from dunit #981. It looks 
like 
PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate
 hung. One thread is trying to create buckets while another is trying to 
shutdown the cluster. Thread stacks are below.

There are several threads creating buckets and stuck waiting for replies to 
PrepareNewPersistentMemberMessage:

{noformat}
"Pooled Waiting Message Processor 6" #2393 daemon prio=5 os_prio=0 
tid=0x00007f29a0005800 nid=0x1a84 waiting on condition [0x00007f2bd28c5000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000e1fb4410> (a 
java.util.concurrent.CountDownLatch$Sync)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
        at 
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:732)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:803)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:780)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:866)
        at 
org.apache.geode.internal.cache.persistence.PrepareNewPersistentMemberMessage.send(PrepareNewPersistentMemberMessage.java:79)
        at 
org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.setInitializing(PersistenceAdvisorImpl.java:460)
        at 
org.apache.geode.internal.cache.BucketPersistenceAdvisor.setInitializing(BucketPersistenceAdvisor.java:416)
        at 
org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:324)
        at 
org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1219)
        at 
org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1079)
        at 
org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:259)
        at 
org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:980)
        at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:783)
        at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:458)
        - locked <0x00000000e1fb4b28> (a 
org.apache.geode.internal.cache.ProxyBucketRegion)
        at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:317)
        at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2897)
        at 
org.apache.geode.internal.cache.PRHARedundancyProvider.createBackupBucketOnMember(PRHARedundancyProvider.java:1086)
        at 
org.apache.geode.internal.cache.partitioned.PartitionedRegionRebalanceOp.createRedundantBucketForRegion(PartitionedRegionRebalanceOp.java:513)
        at 
org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorImpl.createRedundantBucket(BucketOperatorImpl.java:54)
        at 
org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorWrapper.createRedundantBucket(BucketOperatorWrapper.java:100)
        at 
org.apache.geode.internal.cache.partitioned.rebalance.ParallelBucketOperator$1.run(ParallelBucketOperator.java:91)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at 
org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:961)
        at 
org.apache.geode.distributed.internal.ClusterDistributionManager.doWaitingThread(ClusterDistributionManager.java:851)
        at 
org.apache.geode.distributed.internal.ClusterDistributionManager$$Lambda$55/1122354621.invoke(Unknown
 Source)
        at 
org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121)
        at 
org.apache.geode.internal.logging.LoggingThreadFactory$$Lambda$51/717403135.run(Unknown
 Source)
        at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
        - <0x00000000e1fb4ee8> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)
{noformat}

There's one thread that appears to be stuck calling 
AdminDistributedSystemImpl.shutDownAllMembers (NOTE: I don't think we should 
have any tests using this deprecated internal-API to shutdown the cluster)::
{noformat}
      vm0.invoke(new SerializableCallable() {

        @Override
        public Object call() throws Exception {
          InternalDistributedSystem ds =
              (InternalDistributedSystem) getCache().getDistributedSystem();
          
AdminDistributedSystemImpl.shutDownAllMembers(ds.getDistributionManager(), 
600000);
          return null;
        }
      });
{noformat}

Here's the above thread's callstack:

{noformat}
"RMI TCP Connection(1)-172.17.0.25" #32 daemon prio=5 os_prio=0 
tid=0x00007f2ad4001800 nid=0x280 waiting on condition [0x00007f2be13b5000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000e1fb0600> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1115)
        at 
org.apache.geode.internal.util.concurrent.StoppableReentrantReadWriteLock$StoppableWriteLock.lockInterruptibly(StoppableReentrantReadWriteLock.java:197)
        at 
org.apache.geode.internal.util.concurrent.StoppableReentrantReadWriteLock$StoppableWriteLock.lock(StoppableReentrantReadWriteLock.java:182)
        at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.cleanUp(PartitionedRegionDataStore.java:1399)
        at 
org.apache.geode.internal.cache.PartitionedRegion.closePartitionedRegion(PartitionedRegion.java:7140)
        at 
org.apache.geode.internal.cache.PartitionedRegion.postDestroyRegion(PartitionedRegion.java:7716)
        at 
org.apache.geode.internal.cache.LocalRegion.recursiveDestroyRegion(LocalRegion.java:2707)
        at 
org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6175)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.shutDownOnePRGracefully(GemFireCacheImpl.java:1866)
        - locked <0x00000000e1fb0900> (a 
org.apache.geode.internal.cache.PRHARedundancyProvider)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.shutdownSubTreeGracefully(GemFireCacheImpl.java:1687)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.shutDownAll(GemFireCacheImpl.java:1745)
        - locked <0x00000000e01bfd78> (a java.lang.Class for 
org.apache.geode.internal.cache.GemFireCacheImpl)
        at 
org.apache.geode.internal.admin.remote.ShutdownAllRequest.createResponse(ShutdownAllRequest.java:181)
        at 
org.apache.geode.internal.admin.remote.ShutdownAllRequest.send(ShutdownAllRequest.java:88)
        at 
org.apache.geode.admin.internal.AdminDistributedSystemImpl.shutDownAllMembers(AdminDistributedSystemImpl.java:2342)
        at 
org.apache.geode.internal.cache.partitioned.PersistentColocatedPartitionedRegionDUnitTest$25.call(PersistentColocatedPartitionedRegionDUnitTest.java:1785)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.geode.test.dunit.internal.MethodInvoker.executeObject(MethodInvoker.java:123)
        at 
org.apache.geode.test.dunit.internal.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:69)
        at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
        at sun.rmi.transport.Transport$1.run(Transport.java:200)
        at sun.rmi.transport.Transport$1.run(Transport.java:197)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
        at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$8/910555258.run(Unknown
 Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
        - <0x00000000e00ed098> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)
{noformat}


  was:
[https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/981]

The run of the distributed test for java 8 failed because a timeout was 
exceeded.

I've attached the dunit-hangs.txt and callstack files from dunit #981.

It looks like 
PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate
 hung. 

One thread is trying to create buckets while another is trying to shutdown the 
cluster. Thread stacks are below.

There are several threads creating buckets and stuck waiting for replies to 
PrepareNewPersistentMemberMessage:

{noformat}
"Pooled Waiting Message Processor 6" #2393 daemon prio=5 os_prio=0 
tid=0x00007f29a0005800 nid=0x1a84 waiting on condition [0x00007f2bd28c5000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000e1fb4410> (a 
java.util.concurrent.CountDownLatch$Sync)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
        at 
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:732)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:803)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:780)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:866)
        at 
org.apache.geode.internal.cache.persistence.PrepareNewPersistentMemberMessage.send(PrepareNewPersistentMemberMessage.java:79)
        at 
org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.setInitializing(PersistenceAdvisorImpl.java:460)
        at 
org.apache.geode.internal.cache.BucketPersistenceAdvisor.setInitializing(BucketPersistenceAdvisor.java:416)
        at 
org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:324)
        at 
org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1219)
        at 
org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1079)
        at 
org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:259)
        at 
org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:980)
        at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:783)
        at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:458)
        - locked <0x00000000e1fb4b28> (a 
org.apache.geode.internal.cache.ProxyBucketRegion)
        at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:317)
        at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2897)
        at 
org.apache.geode.internal.cache.PRHARedundancyProvider.createBackupBucketOnMember(PRHARedundancyProvider.java:1086)
        at 
org.apache.geode.internal.cache.partitioned.PartitionedRegionRebalanceOp.createRedundantBucketForRegion(PartitionedRegionRebalanceOp.java:513)
        at 
org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorImpl.createRedundantBucket(BucketOperatorImpl.java:54)
        at 
org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorWrapper.createRedundantBucket(BucketOperatorWrapper.java:100)
        at 
org.apache.geode.internal.cache.partitioned.rebalance.ParallelBucketOperator$1.run(ParallelBucketOperator.java:91)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at 
org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:961)
        at 
org.apache.geode.distributed.internal.ClusterDistributionManager.doWaitingThread(ClusterDistributionManager.java:851)
        at 
org.apache.geode.distributed.internal.ClusterDistributionManager$$Lambda$55/1122354621.invoke(Unknown
 Source)
        at 
org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121)
        at 
org.apache.geode.internal.logging.LoggingThreadFactory$$Lambda$51/717403135.run(Unknown
 Source)
        at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
        - <0x00000000e1fb4ee8> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)
{noformat}

There's one thread that appears to be stuck calling 
AdminDistributedSystemImpl.shutDownAllMembers (NOTE: I don't think we should 
have any tests using this deprecated internal-API to shutdown the cluster)::
{noformat}
      vm0.invoke(new SerializableCallable() {

        @Override
        public Object call() throws Exception {
          InternalDistributedSystem ds =
              (InternalDistributedSystem) getCache().getDistributedSystem();
          
AdminDistributedSystemImpl.shutDownAllMembers(ds.getDistributionManager(), 
600000);
          return null;
        }
      });
{noformat}

Here's the above thread's callstack:

{noformat}
"RMI TCP Connection(1)-172.17.0.25" #32 daemon prio=5 os_prio=0 
tid=0x00007f2ad4001800 nid=0x280 waiting on condition [0x00007f2be13b5000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000e1fb0600> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1115)
        at 
org.apache.geode.internal.util.concurrent.StoppableReentrantReadWriteLock$StoppableWriteLock.lockInterruptibly(StoppableReentrantReadWriteLock.java:197)
        at 
org.apache.geode.internal.util.concurrent.StoppableReentrantReadWriteLock$StoppableWriteLock.lock(StoppableReentrantReadWriteLock.java:182)
        at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.cleanUp(PartitionedRegionDataStore.java:1399)
        at 
org.apache.geode.internal.cache.PartitionedRegion.closePartitionedRegion(PartitionedRegion.java:7140)
        at 
org.apache.geode.internal.cache.PartitionedRegion.postDestroyRegion(PartitionedRegion.java:7716)
        at 
org.apache.geode.internal.cache.LocalRegion.recursiveDestroyRegion(LocalRegion.java:2707)
        at 
org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6175)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.shutDownOnePRGracefully(GemFireCacheImpl.java:1866)
        - locked <0x00000000e1fb0900> (a 
org.apache.geode.internal.cache.PRHARedundancyProvider)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.shutdownSubTreeGracefully(GemFireCacheImpl.java:1687)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.shutDownAll(GemFireCacheImpl.java:1745)
        - locked <0x00000000e01bfd78> (a java.lang.Class for 
org.apache.geode.internal.cache.GemFireCacheImpl)
        at 
org.apache.geode.internal.admin.remote.ShutdownAllRequest.createResponse(ShutdownAllRequest.java:181)
        at 
org.apache.geode.internal.admin.remote.ShutdownAllRequest.send(ShutdownAllRequest.java:88)
        at 
org.apache.geode.admin.internal.AdminDistributedSystemImpl.shutDownAllMembers(AdminDistributedSystemImpl.java:2342)
        at 
org.apache.geode.internal.cache.partitioned.PersistentColocatedPartitionedRegionDUnitTest$25.call(PersistentColocatedPartitionedRegionDUnitTest.java:1785)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.geode.test.dunit.internal.MethodInvoker.executeObject(MethodInvoker.java:123)
        at 
org.apache.geode.test.dunit.internal.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:69)
        at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
        at sun.rmi.transport.Transport$1.run(Transport.java:200)
        at sun.rmi.transport.Transport$1.run(Transport.java:197)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
        at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$8/910555258.run(Unknown
 Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
        - <0x00000000e00ed098> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)
{noformat}



> PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate
>  hung or took too long 
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-7082
>                 URL: https://issues.apache.org/jira/browse/GEODE-7082
>             Project: Geode
>          Issue Type: Bug
>          Components: tests
>            Reporter: Mark Hanson
>            Assignee: Jason Huynh
>            Priority: Major
>         Attachments: callstacks-2019-08-12-19-22-55.txt, 
> callstacks-2019-08-12-19-23-06.txt, callstacks-2019-08-12-19-23-16.txt, 
> dunit-hangs.txt
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/981]
> The run of the distributed test for java 8 failed because a timeout was 
> exceeded.
> I've attached the dunit-hangs.txt and callstack files from dunit #981. It 
> looks like 
> PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate
>  hung. One thread is trying to create buckets while another is trying to 
> shutdown the cluster. Thread stacks are below.
> There are several threads creating buckets and stuck waiting for replies to 
> PrepareNewPersistentMemberMessage:
> {noformat}
> "Pooled Waiting Message Processor 6" #2393 daemon prio=5 os_prio=0 
> tid=0x00007f29a0005800 nid=0x1a84 waiting on condition [0x00007f2bd28c5000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000000e1fb4410> (a 
> java.util.concurrent.CountDownLatch$Sync)
>         at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>         at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>         at 
> org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
>         at 
> org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:732)
>         at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:803)
>         at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:780)
>         at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:866)
>         at 
> org.apache.geode.internal.cache.persistence.PrepareNewPersistentMemberMessage.send(PrepareNewPersistentMemberMessage.java:79)
>         at 
> org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.setInitializing(PersistenceAdvisorImpl.java:460)
>         at 
> org.apache.geode.internal.cache.BucketPersistenceAdvisor.setInitializing(BucketPersistenceAdvisor.java:416)
>         at 
> org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:324)
>         at 
> org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1219)
>         at 
> org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1079)
>         at 
> org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:259)
>         at 
> org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:980)
>         at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:783)
>         at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:458)
>         - locked <0x00000000e1fb4b28> (a 
> org.apache.geode.internal.cache.ProxyBucketRegion)
>         at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:317)
>         at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2897)
>         at 
> org.apache.geode.internal.cache.PRHARedundancyProvider.createBackupBucketOnMember(PRHARedundancyProvider.java:1086)
>         at 
> org.apache.geode.internal.cache.partitioned.PartitionedRegionRebalanceOp.createRedundantBucketForRegion(PartitionedRegionRebalanceOp.java:513)
>         at 
> org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorImpl.createRedundantBucket(BucketOperatorImpl.java:54)
>         at 
> org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorWrapper.createRedundantBucket(BucketOperatorWrapper.java:100)
>         at 
> org.apache.geode.internal.cache.partitioned.rebalance.ParallelBucketOperator$1.run(ParallelBucketOperator.java:91)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:961)
>         at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.doWaitingThread(ClusterDistributionManager.java:851)
>         at 
> org.apache.geode.distributed.internal.ClusterDistributionManager$$Lambda$55/1122354621.invoke(Unknown
>  Source)
>         at 
> org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121)
>         at 
> org.apache.geode.internal.logging.LoggingThreadFactory$$Lambda$51/717403135.run(Unknown
>  Source)
>         at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
>         - <0x00000000e1fb4ee8> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> {noformat}
> There's one thread that appears to be stuck calling 
> AdminDistributedSystemImpl.shutDownAllMembers (NOTE: I don't think we should 
> have any tests using this deprecated internal-API to shutdown the cluster)::
> {noformat}
>       vm0.invoke(new SerializableCallable() {
>         @Override
>         public Object call() throws Exception {
>           InternalDistributedSystem ds =
>               (InternalDistributedSystem) getCache().getDistributedSystem();
>           
> AdminDistributedSystemImpl.shutDownAllMembers(ds.getDistributionManager(), 
> 600000);
>           return null;
>         }
>       });
> {noformat}
> Here's the above thread's callstack:
> {noformat}
> "RMI TCP Connection(1)-172.17.0.25" #32 daemon prio=5 os_prio=0 
> tid=0x00007f2ad4001800 nid=0x280 waiting on condition [0x00007f2be13b5000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000000e1fb0600> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>         at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)
>         at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1115)
>         at 
> org.apache.geode.internal.util.concurrent.StoppableReentrantReadWriteLock$StoppableWriteLock.lockInterruptibly(StoppableReentrantReadWriteLock.java:197)
>         at 
> org.apache.geode.internal.util.concurrent.StoppableReentrantReadWriteLock$StoppableWriteLock.lock(StoppableReentrantReadWriteLock.java:182)
>         at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.cleanUp(PartitionedRegionDataStore.java:1399)
>         at 
> org.apache.geode.internal.cache.PartitionedRegion.closePartitionedRegion(PartitionedRegion.java:7140)
>         at 
> org.apache.geode.internal.cache.PartitionedRegion.postDestroyRegion(PartitionedRegion.java:7716)
>         at 
> org.apache.geode.internal.cache.LocalRegion.recursiveDestroyRegion(LocalRegion.java:2707)
>         at 
> org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6175)
>         at 
> org.apache.geode.internal.cache.GemFireCacheImpl.shutDownOnePRGracefully(GemFireCacheImpl.java:1866)
>         - locked <0x00000000e1fb0900> (a 
> org.apache.geode.internal.cache.PRHARedundancyProvider)
>         at 
> org.apache.geode.internal.cache.GemFireCacheImpl.shutdownSubTreeGracefully(GemFireCacheImpl.java:1687)
>         at 
> org.apache.geode.internal.cache.GemFireCacheImpl.shutDownAll(GemFireCacheImpl.java:1745)
>         - locked <0x00000000e01bfd78> (a java.lang.Class for 
> org.apache.geode.internal.cache.GemFireCacheImpl)
>         at 
> org.apache.geode.internal.admin.remote.ShutdownAllRequest.createResponse(ShutdownAllRequest.java:181)
>         at 
> org.apache.geode.internal.admin.remote.ShutdownAllRequest.send(ShutdownAllRequest.java:88)
>         at 
> org.apache.geode.admin.internal.AdminDistributedSystemImpl.shutDownAllMembers(AdminDistributedSystemImpl.java:2342)
>         at 
> org.apache.geode.internal.cache.partitioned.PersistentColocatedPartitionedRegionDUnitTest$25.call(PersistentColocatedPartitionedRegionDUnitTest.java:1785)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.geode.test.dunit.internal.MethodInvoker.executeObject(MethodInvoker.java:123)
>         at 
> org.apache.geode.test.dunit.internal.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:69)
>         at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
>         at sun.rmi.transport.Transport$1.run(Transport.java:200)
>         at sun.rmi.transport.Transport$1.run(Transport.java:197)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
>         at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
>         at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
>         at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
>         at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$8/910555258.run(Unknown
>  Source)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
>         - <0x00000000e00ed098> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to