[ https://issues.apache.org/jira/browse/GEODE-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kirk Lund updated GEODE-7082: ----------------------------- Description: [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/981] The run of the distributed test for java 8 failed because a timeout was exceeded. I've attached the dunit-hangs.txt and callstack files from dunit #981. It looks like PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate hung. One thread is trying to create buckets while another is trying to shutdown the cluster. Thread stacks are below. There are several threads creating buckets and stuck waiting for replies to PrepareNewPersistentMemberMessage: {noformat} "Pooled Waiting Message Processor 6" #2393 daemon prio=5 os_prio=0 tid=0x00007f29a0005800 nid=0x1a84 waiting on condition [0x00007f2bd28c5000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000e1fb4410> (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) at org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72) at org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:732) at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:803) at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:780) at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:866) at org.apache.geode.internal.cache.persistence.PrepareNewPersistentMemberMessage.send(PrepareNewPersistentMemberMessage.java:79) at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.setInitializing(PersistenceAdvisorImpl.java:460) at org.apache.geode.internal.cache.BucketPersistenceAdvisor.setInitializing(BucketPersistenceAdvisor.java:416) at org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:324) at org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1219) at org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1079) at org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:259) at org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:980) at org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:783) at org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:458) - locked <0x00000000e1fb4b28> (a org.apache.geode.internal.cache.ProxyBucketRegion) at org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:317) at org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2897) at org.apache.geode.internal.cache.PRHARedundancyProvider.createBackupBucketOnMember(PRHARedundancyProvider.java:1086) at org.apache.geode.internal.cache.partitioned.PartitionedRegionRebalanceOp.createRedundantBucketForRegion(PartitionedRegionRebalanceOp.java:513) at org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorImpl.createRedundantBucket(BucketOperatorImpl.java:54) at org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorWrapper.createRedundantBucket(BucketOperatorWrapper.java:100) at org.apache.geode.internal.cache.partitioned.rebalance.ParallelBucketOperator$1.run(ParallelBucketOperator.java:91) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:961) at org.apache.geode.distributed.internal.ClusterDistributionManager.doWaitingThread(ClusterDistributionManager.java:851) at org.apache.geode.distributed.internal.ClusterDistributionManager$$Lambda$55/1122354621.invoke(Unknown Source) at org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121) at org.apache.geode.internal.logging.LoggingThreadFactory$$Lambda$51/717403135.run(Unknown Source) at java.lang.Thread.run(Thread.java:748) Locked ownable synchronizers: - <0x00000000e1fb4ee8> (a java.util.concurrent.ThreadPoolExecutor$Worker) {noformat} There's one thread that appears to be stuck calling AdminDistributedSystemImpl.shutDownAllMembers (NOTE: I don't think we should have any tests using this deprecated internal-API to shutdown the cluster):: {noformat} vm0.invoke(new SerializableCallable() { @Override public Object call() throws Exception { InternalDistributedSystem ds = (InternalDistributedSystem) getCache().getDistributedSystem(); AdminDistributedSystemImpl.shutDownAllMembers(ds.getDistributionManager(), 600000); return null; } }); {noformat} Here's the above thread's callstack: {noformat} "RMI TCP Connection(1)-172.17.0.25" #32 daemon prio=5 os_prio=0 tid=0x00007f2ad4001800 nid=0x280 waiting on condition [0x00007f2be13b5000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000e1fb0600> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934) at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1115) at org.apache.geode.internal.util.concurrent.StoppableReentrantReadWriteLock$StoppableWriteLock.lockInterruptibly(StoppableReentrantReadWriteLock.java:197) at org.apache.geode.internal.util.concurrent.StoppableReentrantReadWriteLock$StoppableWriteLock.lock(StoppableReentrantReadWriteLock.java:182) at org.apache.geode.internal.cache.PartitionedRegionDataStore.cleanUp(PartitionedRegionDataStore.java:1399) at org.apache.geode.internal.cache.PartitionedRegion.closePartitionedRegion(PartitionedRegion.java:7140) at org.apache.geode.internal.cache.PartitionedRegion.postDestroyRegion(PartitionedRegion.java:7716) at org.apache.geode.internal.cache.LocalRegion.recursiveDestroyRegion(LocalRegion.java:2707) at org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6175) at org.apache.geode.internal.cache.GemFireCacheImpl.shutDownOnePRGracefully(GemFireCacheImpl.java:1866) - locked <0x00000000e1fb0900> (a org.apache.geode.internal.cache.PRHARedundancyProvider) at org.apache.geode.internal.cache.GemFireCacheImpl.shutdownSubTreeGracefully(GemFireCacheImpl.java:1687) at org.apache.geode.internal.cache.GemFireCacheImpl.shutDownAll(GemFireCacheImpl.java:1745) - locked <0x00000000e01bfd78> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl) at org.apache.geode.internal.admin.remote.ShutdownAllRequest.createResponse(ShutdownAllRequest.java:181) at org.apache.geode.internal.admin.remote.ShutdownAllRequest.send(ShutdownAllRequest.java:88) at org.apache.geode.admin.internal.AdminDistributedSystemImpl.shutDownAllMembers(AdminDistributedSystemImpl.java:2342) at org.apache.geode.internal.cache.partitioned.PersistentColocatedPartitionedRegionDUnitTest$25.call(PersistentColocatedPartitionedRegionDUnitTest.java:1785) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.geode.test.dunit.internal.MethodInvoker.executeObject(MethodInvoker.java:123) at org.apache.geode.test.dunit.internal.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:69) at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357) at sun.rmi.transport.Transport$1.run(Transport.java:200) at sun.rmi.transport.Transport$1.run(Transport.java:197) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:196) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$8/910555258.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Locked ownable synchronizers: - <0x00000000e00ed098> (a java.util.concurrent.ThreadPoolExecutor$Worker) {noformat} was: [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/981] The run of the distributed test for java 8 failed because a timeout was exceeded. I've attached the dunit-hangs.txt and callstack files from dunit #981. It looks like PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate hung. One thread is trying to create buckets while another is trying to shutdown the cluster. Thread stacks are below. There are several threads creating buckets and stuck waiting for replies to PrepareNewPersistentMemberMessage: {noformat} "Pooled Waiting Message Processor 6" #2393 daemon prio=5 os_prio=0 tid=0x00007f29a0005800 nid=0x1a84 waiting on condition [0x00007f2bd28c5000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000e1fb4410> (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) at org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72) at org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:732) at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:803) at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:780) at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:866) at org.apache.geode.internal.cache.persistence.PrepareNewPersistentMemberMessage.send(PrepareNewPersistentMemberMessage.java:79) at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.setInitializing(PersistenceAdvisorImpl.java:460) at org.apache.geode.internal.cache.BucketPersistenceAdvisor.setInitializing(BucketPersistenceAdvisor.java:416) at org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:324) at org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1219) at org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1079) at org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:259) at org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:980) at org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:783) at org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:458) - locked <0x00000000e1fb4b28> (a org.apache.geode.internal.cache.ProxyBucketRegion) at org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:317) at org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2897) at org.apache.geode.internal.cache.PRHARedundancyProvider.createBackupBucketOnMember(PRHARedundancyProvider.java:1086) at org.apache.geode.internal.cache.partitioned.PartitionedRegionRebalanceOp.createRedundantBucketForRegion(PartitionedRegionRebalanceOp.java:513) at org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorImpl.createRedundantBucket(BucketOperatorImpl.java:54) at org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorWrapper.createRedundantBucket(BucketOperatorWrapper.java:100) at org.apache.geode.internal.cache.partitioned.rebalance.ParallelBucketOperator$1.run(ParallelBucketOperator.java:91) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:961) at org.apache.geode.distributed.internal.ClusterDistributionManager.doWaitingThread(ClusterDistributionManager.java:851) at org.apache.geode.distributed.internal.ClusterDistributionManager$$Lambda$55/1122354621.invoke(Unknown Source) at org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121) at org.apache.geode.internal.logging.LoggingThreadFactory$$Lambda$51/717403135.run(Unknown Source) at java.lang.Thread.run(Thread.java:748) Locked ownable synchronizers: - <0x00000000e1fb4ee8> (a java.util.concurrent.ThreadPoolExecutor$Worker) {noformat} There's one thread that appears to be stuck calling AdminDistributedSystemImpl.shutDownAllMembers (NOTE: I don't think we should have any tests using this deprecated internal-API to shutdown the cluster):: {noformat} vm0.invoke(new SerializableCallable() { @Override public Object call() throws Exception { InternalDistributedSystem ds = (InternalDistributedSystem) getCache().getDistributedSystem(); AdminDistributedSystemImpl.shutDownAllMembers(ds.getDistributionManager(), 600000); return null; } }); {noformat} Here's the above thread's callstack: {noformat} "RMI TCP Connection(1)-172.17.0.25" #32 daemon prio=5 os_prio=0 tid=0x00007f2ad4001800 nid=0x280 waiting on condition [0x00007f2be13b5000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000e1fb0600> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934) at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1115) at org.apache.geode.internal.util.concurrent.StoppableReentrantReadWriteLock$StoppableWriteLock.lockInterruptibly(StoppableReentrantReadWriteLock.java:197) at org.apache.geode.internal.util.concurrent.StoppableReentrantReadWriteLock$StoppableWriteLock.lock(StoppableReentrantReadWriteLock.java:182) at org.apache.geode.internal.cache.PartitionedRegionDataStore.cleanUp(PartitionedRegionDataStore.java:1399) at org.apache.geode.internal.cache.PartitionedRegion.closePartitionedRegion(PartitionedRegion.java:7140) at org.apache.geode.internal.cache.PartitionedRegion.postDestroyRegion(PartitionedRegion.java:7716) at org.apache.geode.internal.cache.LocalRegion.recursiveDestroyRegion(LocalRegion.java:2707) at org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6175) at org.apache.geode.internal.cache.GemFireCacheImpl.shutDownOnePRGracefully(GemFireCacheImpl.java:1866) - locked <0x00000000e1fb0900> (a org.apache.geode.internal.cache.PRHARedundancyProvider) at org.apache.geode.internal.cache.GemFireCacheImpl.shutdownSubTreeGracefully(GemFireCacheImpl.java:1687) at org.apache.geode.internal.cache.GemFireCacheImpl.shutDownAll(GemFireCacheImpl.java:1745) - locked <0x00000000e01bfd78> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl) at org.apache.geode.internal.admin.remote.ShutdownAllRequest.createResponse(ShutdownAllRequest.java:181) at org.apache.geode.internal.admin.remote.ShutdownAllRequest.send(ShutdownAllRequest.java:88) at org.apache.geode.admin.internal.AdminDistributedSystemImpl.shutDownAllMembers(AdminDistributedSystemImpl.java:2342) at org.apache.geode.internal.cache.partitioned.PersistentColocatedPartitionedRegionDUnitTest$25.call(PersistentColocatedPartitionedRegionDUnitTest.java:1785) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.geode.test.dunit.internal.MethodInvoker.executeObject(MethodInvoker.java:123) at org.apache.geode.test.dunit.internal.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:69) at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357) at sun.rmi.transport.Transport$1.run(Transport.java:200) at sun.rmi.transport.Transport$1.run(Transport.java:197) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:196) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$8/910555258.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Locked ownable synchronizers: - <0x00000000e00ed098> (a java.util.concurrent.ThreadPoolExecutor$Worker) {noformat} > PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate > hung or took too long > ---------------------------------------------------------------------------------------------------------------------------- > > Key: GEODE-7082 > URL: https://issues.apache.org/jira/browse/GEODE-7082 > Project: Geode > Issue Type: Bug > Components: tests > Reporter: Mark Hanson > Assignee: Jason Huynh > Priority: Major > Attachments: callstacks-2019-08-12-19-22-55.txt, > callstacks-2019-08-12-19-23-06.txt, callstacks-2019-08-12-19-23-16.txt, > dunit-hangs.txt > > Time Spent: 20m > Remaining Estimate: 0h > > [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/981] > The run of the distributed test for java 8 failed because a timeout was > exceeded. > I've attached the dunit-hangs.txt and callstack files from dunit #981. It > looks like > PersistentColocatedPartitionedRegionDUnitTest.testReplaceOfflineMemberAndRestartCreateColocatedPRLate > hung. One thread is trying to create buckets while another is trying to > shutdown the cluster. Thread stacks are below. > There are several threads creating buckets and stuck waiting for replies to > PrepareNewPersistentMemberMessage: > {noformat} > "Pooled Waiting Message Processor 6" #2393 daemon prio=5 os_prio=0 > tid=0x00007f29a0005800 nid=0x1a84 waiting on condition [0x00007f2bd28c5000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000000e1fb4410> (a > java.util.concurrent.CountDownLatch$Sync) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > at > org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72) > at > org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:732) > at > org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:803) > at > org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:780) > at > org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:866) > at > org.apache.geode.internal.cache.persistence.PrepareNewPersistentMemberMessage.send(PrepareNewPersistentMemberMessage.java:79) > at > org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.setInitializing(PersistenceAdvisorImpl.java:460) > at > org.apache.geode.internal.cache.BucketPersistenceAdvisor.setInitializing(BucketPersistenceAdvisor.java:416) > at > org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:324) > at > org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1219) > at > org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1079) > at > org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:259) > at > org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:980) > at > org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:783) > at > org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:458) > - locked <0x00000000e1fb4b28> (a > org.apache.geode.internal.cache.ProxyBucketRegion) > at > org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:317) > at > org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2897) > at > org.apache.geode.internal.cache.PRHARedundancyProvider.createBackupBucketOnMember(PRHARedundancyProvider.java:1086) > at > org.apache.geode.internal.cache.partitioned.PartitionedRegionRebalanceOp.createRedundantBucketForRegion(PartitionedRegionRebalanceOp.java:513) > at > org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorImpl.createRedundantBucket(BucketOperatorImpl.java:54) > at > org.apache.geode.internal.cache.partitioned.rebalance.BucketOperatorWrapper.createRedundantBucket(BucketOperatorWrapper.java:100) > at > org.apache.geode.internal.cache.partitioned.rebalance.ParallelBucketOperator$1.run(ParallelBucketOperator.java:91) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:961) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.doWaitingThread(ClusterDistributionManager.java:851) > at > org.apache.geode.distributed.internal.ClusterDistributionManager$$Lambda$55/1122354621.invoke(Unknown > Source) > at > org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121) > at > org.apache.geode.internal.logging.LoggingThreadFactory$$Lambda$51/717403135.run(Unknown > Source) > at java.lang.Thread.run(Thread.java:748) > Locked ownable synchronizers: > - <0x00000000e1fb4ee8> (a > java.util.concurrent.ThreadPoolExecutor$Worker) > {noformat} > There's one thread that appears to be stuck calling > AdminDistributedSystemImpl.shutDownAllMembers (NOTE: I don't think we should > have any tests using this deprecated internal-API to shutdown the cluster):: > {noformat} > vm0.invoke(new SerializableCallable() { > @Override > public Object call() throws Exception { > InternalDistributedSystem ds = > (InternalDistributedSystem) getCache().getDistributedSystem(); > > AdminDistributedSystemImpl.shutDownAllMembers(ds.getDistributionManager(), > 600000); > return null; > } > }); > {noformat} > Here's the above thread's callstack: > {noformat} > "RMI TCP Connection(1)-172.17.0.25" #32 daemon prio=5 os_prio=0 > tid=0x00007f2ad4001800 nid=0x280 waiting on condition [0x00007f2be13b5000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000000e1fb0600> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1115) > at > org.apache.geode.internal.util.concurrent.StoppableReentrantReadWriteLock$StoppableWriteLock.lockInterruptibly(StoppableReentrantReadWriteLock.java:197) > at > org.apache.geode.internal.util.concurrent.StoppableReentrantReadWriteLock$StoppableWriteLock.lock(StoppableReentrantReadWriteLock.java:182) > at > org.apache.geode.internal.cache.PartitionedRegionDataStore.cleanUp(PartitionedRegionDataStore.java:1399) > at > org.apache.geode.internal.cache.PartitionedRegion.closePartitionedRegion(PartitionedRegion.java:7140) > at > org.apache.geode.internal.cache.PartitionedRegion.postDestroyRegion(PartitionedRegion.java:7716) > at > org.apache.geode.internal.cache.LocalRegion.recursiveDestroyRegion(LocalRegion.java:2707) > at > org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6175) > at > org.apache.geode.internal.cache.GemFireCacheImpl.shutDownOnePRGracefully(GemFireCacheImpl.java:1866) > - locked <0x00000000e1fb0900> (a > org.apache.geode.internal.cache.PRHARedundancyProvider) > at > org.apache.geode.internal.cache.GemFireCacheImpl.shutdownSubTreeGracefully(GemFireCacheImpl.java:1687) > at > org.apache.geode.internal.cache.GemFireCacheImpl.shutDownAll(GemFireCacheImpl.java:1745) > - locked <0x00000000e01bfd78> (a java.lang.Class for > org.apache.geode.internal.cache.GemFireCacheImpl) > at > org.apache.geode.internal.admin.remote.ShutdownAllRequest.createResponse(ShutdownAllRequest.java:181) > at > org.apache.geode.internal.admin.remote.ShutdownAllRequest.send(ShutdownAllRequest.java:88) > at > org.apache.geode.admin.internal.AdminDistributedSystemImpl.shutDownAllMembers(AdminDistributedSystemImpl.java:2342) > at > org.apache.geode.internal.cache.partitioned.PersistentColocatedPartitionedRegionDUnitTest$25.call(PersistentColocatedPartitionedRegionDUnitTest.java:1785) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.geode.test.dunit.internal.MethodInvoker.executeObject(MethodInvoker.java:123) > at > org.apache.geode.test.dunit.internal.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:69) > at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357) > at sun.rmi.transport.Transport$1.run(Transport.java:200) > at sun.rmi.transport.Transport$1.run(Transport.java:197) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.Transport.serviceCall(Transport.java:196) > at > sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$8/910555258.run(Unknown > Source) > at java.security.AccessController.doPrivileged(Native Method) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Locked ownable synchronizers: > - <0x00000000e00ed098> (a > java.util.concurrent.ThreadPoolExecutor$Worker) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016)