[ 
https://issues.apache.org/jira/browse/GEODE-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk Lund updated GEODE-7082:
-----------------------------
    Description: 
[https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/981]

The run of the distributed test for java 8 failed because a timeout was 
exceeded.

It looks like 
PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate
 hung. 

One thread is trying to create buckets while another is trying to shutdown the 
cluster. Thread stacks are below.

The thread trying to shutdown the cluster is using an old deprecated internal 
method rather than going through a current API:
{noformat}
      vm0.invoke(new SerializableCallable() {

        @Override
        public Object call() throws Exception {
          InternalDistributedSystem ds =
              (InternalDistributedSystem) getCache().getDistributedSystem();
          
AdminDistributedSystemImpl.shutDownAllMembers(ds.getDistributionManager(), 
600000);
          return null;
        }
      });
{noformat}

There are several threads creating buckets and stuck waiting for replies to 
PrepareNewPersistentMemberMessage:
{noformat}
"Pooled Waiting Message Processor 6" #2393 daemon prio=5 os_prio=0 
tid=0x00007f29a0005800 nid=0x1a84 waiting on condition [0x00007f2bd28c5000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000e1fb4410> (a 
java.util.concurrent.CountDownLatch$Sync)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
        at 
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:732)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:803)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:780)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:866)
        at 
org.apache.geode.internal.cache.persistence.PrepareNewPersistentMemberMessage.send(PrepareNewPersistentMemberMessage.java:79)
        at 
org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.setInitializing(PersistenceAdvisorImpl.java:460)
        at 
org.apache.geode.internal.cache.BucketPersistenceAdvisor.setInitializing(BucketPersistenceAdvisor.java:416)
        at 
org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:324)
        at 
org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1219)
        at 
org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1079)
        at 
org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:259)
        at 
org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:980)
        at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:783)
        at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:458)
        - locked <0x00000000e1fb4b28> (a 
org.apache.geode.internal.cache.ProxyBucketRegion)
        at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:317)
        at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2897)
        at 
org.apache.geode.internal.cache.PRHARedundancyProvider.createBackupBucketOnMember(PRHARedundancyProvider.java:1086)
        at 
org.apache.geode.internal.cache.partitioned.PartitionedRegionRebalanceOp.createRedundantBucketForRegion(PartitionedRegionRebalanceOp.java:513)
        at 
org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorImpl.createRedundantBucket(BucketOperatorImpl.java:54)
        at 
org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorWrapper.createRedundantBucket(BucketOperatorWrapper.java:100)
        at 
org.apache.geode.internal.cache.partitioned.rebalance.ParallelBucketOperator$1.run(ParallelBucketOperator.java:91)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at 
org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:961)
        at 
org.apache.geode.distributed.internal.ClusterDistributionManager.doWaitingThread(ClusterDistributionManager.java:851)
        at 
org.apache.geode.distributed.internal.ClusterDistributionManager$$Lambda$55/1122354621.invoke(Unknown
 Source)
        at 
org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121)
        at 
org.apache.geode.internal.logging.LoggingThreadFactory$$Lambda$51/717403135.run(Unknown
 Source)
        at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
        - <0x00000000e1fb4ee8> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)
{noformat}

There's one thread that appears to be stuck calling 
AdminDistributedSystemImpl.shutDownAllMembers (NOTE: I don't think we should 
have any tests using this deprecated internal-API to shutdown the cluster):
{noformat}
"RMI TCP Connection(1)-172.17.0.25" #32 daemon prio=5 os_prio=0 
tid=0x00007f2ad4001800 nid=0x280 waiting on condition [0x00007f2be13b5000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000e1fb0600> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1115)
        at 
org.apache.geode.internal.util.concurrent.StoppableReentrantReadWriteLock$StoppableWriteLock.lockInterruptibly(StoppableReentrantReadWriteLock.java:197)
        at 
org.apache.geode.internal.util.concurrent.StoppableReentrantReadWriteLock$StoppableWriteLock.lock(StoppableReentrantReadWriteLock.java:182)
        at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.cleanUp(PartitionedRegionDataStore.java:1399)
        at 
org.apache.geode.internal.cache.PartitionedRegion.closePartitionedRegion(PartitionedRegion.java:7140)
        at 
org.apache.geode.internal.cache.PartitionedRegion.postDestroyRegion(PartitionedRegion.java:7716)
        at 
org.apache.geode.internal.cache.LocalRegion.recursiveDestroyRegion(LocalRegion.java:2707)
        at 
org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6175)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.shutDownOnePRGracefully(GemFireCacheImpl.java:1866)
        - locked <0x00000000e1fb0900> (a 
org.apache.geode.internal.cache.PRHARedundancyProvider)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.shutdownSubTreeGracefully(GemFireCacheImpl.java:1687)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.shutDownAll(GemFireCacheImpl.java:1745)
        - locked <0x00000000e01bfd78> (a java.lang.Class for 
org.apache.geode.internal.cache.GemFireCacheImpl)
        at 
org.apache.geode.internal.admin.remote.ShutdownAllRequest.createResponse(ShutdownAllRequest.java:181)
        at 
org.apache.geode.internal.admin.remote.ShutdownAllRequest.send(ShutdownAllRequest.java:88)
        at 
org.apache.geode.admin.internal.AdminDistributedSystemImpl.shutDownAllMembers(AdminDistributedSystemImpl.java:2342)
        at 
org.apache.geode.internal.cache.partitioned.PersistentColocatedPartitionedRegionDUnitTest$25.call(PersistentColocatedPartitionedRegionDUnitTest.java:1785)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.geode.test.dunit.internal.MethodInvoker.executeObject(MethodInvoker.java:123)
        at 
org.apache.geode.test.dunit.internal.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:69)
        at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
        at sun.rmi.transport.Transport$1.run(Transport.java:200)
        at sun.rmi.transport.Transport$1.run(Transport.java:197)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
        at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$8/910555258.run(Unknown
 Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
        - <0x00000000e00ed098> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)
{noformat}


  was:
[https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/981]

The run of the distributed test for java 8 failed because a timeout was 
exceeded.

It looks like 
PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate
 hung. 

One thread is trying to create buckets while another is trying to shutdown the 
cluster. Thread stacks are below.

The thread trying to shutdown the cluster is using an old deprecated internal 
method rather than going through a current API:
{noformat}
      vm0.invoke(new SerializableCallable() {

        @Override
        public Object call() throws Exception {
          InternalDistributedSystem ds =
              (InternalDistributedSystem) getCache().getDistributedSystem();
          
AdminDistributedSystemImpl.shutDownAllMembers(ds.getDistributionManager(), 
600000);
          return null;
        }
      });
{noformat}

There are several threads creating buckets and stuck waiting for replies to 
PrepareNewPersistentMemberMessage:
{noformat}
"Pooled Waiting Message Processor 6" #2393 daemon prio=5 os_prio=0 
tid=0x00007f29a0005800 nid=0x1a84 waiting on condition [0x00007f2bd28c5000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000e1fb4410> (a 
java.util.concurrent.CountDownLatch$Sync)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
        at 
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:732)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:803)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:780)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:866)
        at 
org.apache.geode.internal.cache.persistence.PrepareNewPersistentMemberMessage.send(PrepareNewPersistentMemberMessage.java:79)
        at 
org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.setInitializing(PersistenceAdvisorImpl.java:460)
        at 
org.apache.geode.internal.cache.BucketPersistenceAdvisor.setInitializing(BucketPersistenceAdvisor.java:416)
        at 
org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:324)
        at 
org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1219)
        at 
org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1079)
        at 
org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:259)
        at 
org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:980)
        at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:783)
        at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:458)
        - locked <0x00000000e1fb4b28> (a 
org.apache.geode.internal.cache.ProxyBucketRegion)
        at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:317)
        at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2897)
        at 
org.apache.geode.internal.cache.PRHARedundancyProvider.createBackupBucketOnMember(PRHARedundancyProvider.java:1086)
        at 
org.apache.geode.internal.cache.partitioned.PartitionedRegionRebalanceOp.createRedundantBucketForRegion(PartitionedRegionRebalanceOp.java:513)
        at 
org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorImpl.createRedundantBucket(BucketOperatorImpl.java:54)
        at 
org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorWrapper.createRedundantBucket(BucketOperatorWrapper.java:100)
        at 
org.apache.geode.internal.cache.partitioned.rebalance.ParallelBucketOperator$1.run(ParallelBucketOperator.java:91)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at 
org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:961)
        at 
org.apache.geode.distributed.internal.ClusterDistributionManager.doWaitingThread(ClusterDistributionManager.java:851)
        at 
org.apache.geode.distributed.internal.ClusterDistributionManager$$Lambda$55/1122354621.invoke(Unknown
 Source)
        at 
org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121)
        at 
org.apache.geode.internal.logging.LoggingThreadFactory$$Lambda$51/717403135.run(Unknown
 Source)
        at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
        - <0x00000000e1fb4ee8> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)
{noformat}

Here's the thread stuck calling AdminDistributedSystemImpl.shutDownAllMembers:
{noformat}
"RMI TCP Connection(1)-172.17.0.25" #32 daemon prio=5 os_prio=0 
tid=0x00007f2ad4001800 nid=0x280 waiting on condition [0x00007f2be13b5000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000e1fb0600> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1115)
        at 
org.apache.geode.internal.util.concurrent.StoppableReentrantReadWriteLock$StoppableWriteLock.lockInterruptibly(StoppableReentrantReadWriteLock.java:197)
        at 
org.apache.geode.internal.util.concurrent.StoppableReentrantReadWriteLock$StoppableWriteLock.lock(StoppableReentrantReadWriteLock.java:182)
        at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.cleanUp(PartitionedRegionDataStore.java:1399)
        at 
org.apache.geode.internal.cache.PartitionedRegion.closePartitionedRegion(PartitionedRegion.java:7140)
        at 
org.apache.geode.internal.cache.PartitionedRegion.postDestroyRegion(PartitionedRegion.java:7716)
        at 
org.apache.geode.internal.cache.LocalRegion.recursiveDestroyRegion(LocalRegion.java:2707)
        at 
org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6175)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.shutDownOnePRGracefully(GemFireCacheImpl.java:1866)
        - locked <0x00000000e1fb0900> (a 
org.apache.geode.internal.cache.PRHARedundancyProvider)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.shutdownSubTreeGracefully(GemFireCacheImpl.java:1687)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.shutDownAll(GemFireCacheImpl.java:1745)
        - locked <0x00000000e01bfd78> (a java.lang.Class for 
org.apache.geode.internal.cache.GemFireCacheImpl)
        at 
org.apache.geode.internal.admin.remote.ShutdownAllRequest.createResponse(ShutdownAllRequest.java:181)
        at 
org.apache.geode.internal.admin.remote.ShutdownAllRequest.send(ShutdownAllRequest.java:88)
        at 
org.apache.geode.admin.internal.AdminDistributedSystemImpl.shutDownAllMembers(AdminDistributedSystemImpl.java:2342)
        at 
org.apache.geode.internal.cache.partitioned.PersistentColocatedPartitionedRegionDUnitTest$25.call(PersistentColocatedPartitionedRegionDUnitTest.java:1785)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.geode.test.dunit.internal.MethodInvoker.executeObject(MethodInvoker.java:123)
        at 
org.apache.geode.test.dunit.internal.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:69)
        at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
        at sun.rmi.transport.Transport$1.run(Transport.java:200)
        at sun.rmi.transport.Transport$1.run(Transport.java:197)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
        at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$8/910555258.run(Unknown
 Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
        - <0x00000000e00ed098> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)
{noformat}



> PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate
>  hung or took too long 
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-7082
>                 URL: https://issues.apache.org/jira/browse/GEODE-7082
>             Project: Geode
>          Issue Type: Bug
>          Components: tests
>            Reporter: Mark Hanson
>            Assignee: Jason Huynh
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/981]
> The run of the distributed test for java 8 failed because a timeout was 
> exceeded.
> It looks like 
> PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate
>  hung. 
> One thread is trying to create buckets while another is trying to shutdown 
> the cluster. Thread stacks are below.
> The thread trying to shutdown the cluster is using an old deprecated internal 
> method rather than going through a current API:
> {noformat}
>       vm0.invoke(new SerializableCallable() {
>         @Override
>         public Object call() throws Exception {
>           InternalDistributedSystem ds =
>               (InternalDistributedSystem) getCache().getDistributedSystem();
>           
> AdminDistributedSystemImpl.shutDownAllMembers(ds.getDistributionManager(), 
> 600000);
>           return null;
>         }
>       });
> {noformat}
> There are several threads creating buckets and stuck waiting for replies to 
> PrepareNewPersistentMemberMessage:
> {noformat}
> "Pooled Waiting Message Processor 6" #2393 daemon prio=5 os_prio=0 
> tid=0x00007f29a0005800 nid=0x1a84 waiting on condition [0x00007f2bd28c5000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000000e1fb4410> (a 
> java.util.concurrent.CountDownLatch$Sync)
>         at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>         at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>         at 
> org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
>         at 
> org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:732)
>         at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:803)
>         at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:780)
>         at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:866)
>         at 
> org.apache.geode.internal.cache.persistence.PrepareNewPersistentMemberMessage.send(PrepareNewPersistentMemberMessage.java:79)
>         at 
> org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.setInitializing(PersistenceAdvisorImpl.java:460)
>         at 
> org.apache.geode.internal.cache.BucketPersistenceAdvisor.setInitializing(BucketPersistenceAdvisor.java:416)
>         at 
> org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:324)
>         at 
> org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1219)
>         at 
> org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1079)
>         at 
> org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:259)
>         at 
> org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:980)
>         at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:783)
>         at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:458)
>         - locked <0x00000000e1fb4b28> (a 
> org.apache.geode.internal.cache.ProxyBucketRegion)
>         at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:317)
>         at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2897)
>         at 
> org.apache.geode.internal.cache.PRHARedundancyProvider.createBackupBucketOnMember(PRHARedundancyProvider.java:1086)
>         at 
> org.apache.geode.internal.cache.partitioned.PartitionedRegionRebalanceOp.createRedundantBucketForRegion(PartitionedRegionRebalanceOp.java:513)
>         at 
> org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorImpl.createRedundantBucket(BucketOperatorImpl.java:54)
>         at 
> org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorWrapper.createRedundantBucket(BucketOperatorWrapper.java:100)
>         at 
> org.apache.geode.internal.cache.partitioned.rebalance.ParallelBucketOperator$1.run(ParallelBucketOperator.java:91)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:961)
>         at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.doWaitingThread(ClusterDistributionManager.java:851)
>         at 
> org.apache.geode.distributed.internal.ClusterDistributionManager$$Lambda$55/1122354621.invoke(Unknown
>  Source)
>         at 
> org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121)
>         at 
> org.apache.geode.internal.logging.LoggingThreadFactory$$Lambda$51/717403135.run(Unknown
>  Source)
>         at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
>         - <0x00000000e1fb4ee8> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> {noformat}
> There's one thread that appears to be stuck calling 
> AdminDistributedSystemImpl.shutDownAllMembers (NOTE: I don't think we should 
> have any tests using this deprecated internal-API to shutdown the cluster):
> {noformat}
> "RMI TCP Connection(1)-172.17.0.25" #32 daemon prio=5 os_prio=0 
> tid=0x00007f2ad4001800 nid=0x280 waiting on condition [0x00007f2be13b5000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000000e1fb0600> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>         at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)
>         at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1115)
>         at 
> org.apache.geode.internal.util.concurrent.StoppableReentrantReadWriteLock$StoppableWriteLock.lockInterruptibly(StoppableReentrantReadWriteLock.java:197)
>         at 
> org.apache.geode.internal.util.concurrent.StoppableReentrantReadWriteLock$StoppableWriteLock.lock(StoppableReentrantReadWriteLock.java:182)
>         at 
> org.apache.geode.internal.cache.PartitionedRegionDataStore.cleanUp(PartitionedRegionDataStore.java:1399)
>         at 
> org.apache.geode.internal.cache.PartitionedRegion.closePartitionedRegion(PartitionedRegion.java:7140)
>         at 
> org.apache.geode.internal.cache.PartitionedRegion.postDestroyRegion(PartitionedRegion.java:7716)
>         at 
> org.apache.geode.internal.cache.LocalRegion.recursiveDestroyRegion(LocalRegion.java:2707)
>         at 
> org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6175)
>         at 
> org.apache.geode.internal.cache.GemFireCacheImpl.shutDownOnePRGracefully(GemFireCacheImpl.java:1866)
>         - locked <0x00000000e1fb0900> (a 
> org.apache.geode.internal.cache.PRHARedundancyProvider)
>         at 
> org.apache.geode.internal.cache.GemFireCacheImpl.shutdownSubTreeGracefully(GemFireCacheImpl.java:1687)
>         at 
> org.apache.geode.internal.cache.GemFireCacheImpl.shutDownAll(GemFireCacheImpl.java:1745)
>         - locked <0x00000000e01bfd78> (a java.lang.Class for 
> org.apache.geode.internal.cache.GemFireCacheImpl)
>         at 
> org.apache.geode.internal.admin.remote.ShutdownAllRequest.createResponse(ShutdownAllRequest.java:181)
>         at 
> org.apache.geode.internal.admin.remote.ShutdownAllRequest.send(ShutdownAllRequest.java:88)
>         at 
> org.apache.geode.admin.internal.AdminDistributedSystemImpl.shutDownAllMembers(AdminDistributedSystemImpl.java:2342)
>         at 
> org.apache.geode.internal.cache.partitioned.PersistentColocatedPartitionedRegionDUnitTest$25.call(PersistentColocatedPartitionedRegionDUnitTest.java:1785)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.geode.test.dunit.internal.MethodInvoker.executeObject(MethodInvoker.java:123)
>         at 
> org.apache.geode.test.dunit.internal.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:69)
>         at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
>         at sun.rmi.transport.Transport$1.run(Transport.java:200)
>         at sun.rmi.transport.Transport$1.run(Transport.java:197)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
>         at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
>         at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
>         at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
>         at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$8/910555258.run(Unknown
>  Source)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
>         - <0x00000000e00ed098> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to