Kirk Lund created GEODE-7411:
--------------------------------

             Summary: 
IncrementalBackupDistributedTest.testMissingMemberInBaseline hangs in CI
                 Key: GEODE-7411
                 URL: https://issues.apache.org/jira/browse/GEODE-7411
             Project: Geode
          Issue Type: Bug
          Components: core, persistence
            Reporter: Kirk Lund


IncrementalBackupDistributedTest.testMissingMemberInBaseline hangs in CI. The 
test also has another bug filed against it for a suspect string (GEODE-7252).

I analyzed the hang in 
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/1274.

dunit-hangs.txt contains:
{noformat}
Started @ 2019-11-05 01:18:22.452 +0000
2019-11-05 02:19:21.812 +0000  
org.apache.geode.internal.cache.backup.IncrementalBackupDistributedTest 
testMissingMemberInBaseline
Ended @ 2019-11-05 03:19:58.983 +0000
{noformat}

The callstacks files show several BackupServiceThreads waiting on the 
CountDownLatch {{otherMembersReady}}.

Several Asynchronous disk writer for region threads show some activity.

There is also one thread testWorker thread hung in Cache close while waiting 
for DiskInitFile lock.

BackupServiceThreads:
{noformat}
"BackupServiceThread1" #202 daemon prio=10 os_prio=0 tid=0x00007fa50c001000 
nid=0x473 waiting on condition [0x00007fa5acf1d000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000d0bf62c0> (a 
java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
        at 
org.apache.geode.internal.cache.backup.BackupTask.backup(BackupTask.java:72)
        at 
org.apache.geode.internal.cache.backup.BackupService.lambda$prepareBackup$0(BackupService.java:62)
        at 
org.apache.geode.internal.cache.backup.BackupService$$Lambda$268/1361475857.call(Unknown
 Source)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
        - <0x00000000d0bf63d8> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)
        - <0x00000000d0c49960> (a 
java.util.concurrent.locks.ReentrantLock$NonfairSync)
        - <0x00000000d1084ef8> (a 
java.util.concurrent.locks.ReentrantLock$NonfairSync)
{noformat}
{noformat}
"BackupServiceThread1" #193 daemon prio=5 os_prio=0 tid=0x00007f45fc82c000 
nid=0x46d waiting on condition [0x00007f4716bd7000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000e0eb0a58> (a 
java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
        at 
org.apache.geode.internal.cache.backup.BackupTask.backup(BackupTask.java:72)
        at 
org.apache.geode.internal.cache.backup.BackupService.lambda$prepareBackup$0(BackupService.java:62)
        at 
org.apache.geode.internal.cache.backup.BackupService$$Lambda$253/1051811564.call(Unknown
 Source)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
        - <0x00000000e0e7f410> (a 
java.util.concurrent.locks.ReentrantLock$NonfairSync)
        - <0x00000000e0e9e4f8> (a 
java.util.concurrent.locks.ReentrantLock$NonfairSync)
        - <0x00000000e0eb0958> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)
{noformat}
{noformat}
"BackupServiceThread1" #102 daemon prio=5 os_prio=0 tid=0x00007f45fc536000 
nid=0x35a waiting on condition [0x00007f471c9c2000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000e0a07500> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
        - None
{noformat}
{noformat}
"BackupServiceThread1" #81 daemon prio=10 os_prio=0 tid=0x00007faf38006800 
nid=0x359 waiting on condition [0x00007faf5f8f7000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000e0a47f30> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
        - None
{noformat}

The "test worker" hung waiting on the DiskInitFile lock trying to close the 
cache:
{noformat}
"Test worker" #25 prio=5 os_prio=0 tid=0x00007fa800a1d000 nid=0x5e waiting on 
condition [0x00007fa6d5db3000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000d0c49960> (a 
java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        at 
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
        at 
org.apache.geode.internal.cache.DiskInitFile.lock(DiskInitFile.java:389)
        at 
org.apache.geode.internal.cache.DiskInitFile.getOrCreateCanonicalId(DiskInitFile.java:2191)
        at org.apache.geode.internal.cache.Oplog.serializeRVVs(Oplog.java:6253)
        at org.apache.geode.internal.cache.Oplog.krfFileCreate(Oplog.java:3937)
        at org.apache.geode.internal.cache.Oplog.createKrf(Oplog.java:4077)
        - locked <0x00000000d0c49bf0> (a 
java.util.concurrent.atomic.AtomicBoolean)
        at org.apache.geode.internal.cache.Oplog.finishKrf(Oplog.java:6341)
        at 
org.apache.geode.internal.cache.Oplog.prepareForClose(Oplog.java:6346)
        at 
org.apache.geode.internal.cache.PersistentOplogSet.prepareForClose(PersistentOplogSet.java:980)
        at 
org.apache.geode.internal.cache.DiskStoreImpl.prepareForClose(DiskStoreImpl.java:2490)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.prepareDiskStoresForClose(GemFireCacheImpl.java:2432)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2173)
        - locked <0x00000000d09fdfa8> (a java.lang.Class for 
org.apache.geode.internal.cache.GemFireCacheImpl)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1911)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1901)
        at 
org.apache.geode.test.dunit.rules.CacheRule.closeCache(CacheRule.java:238)
        at 
org.apache.geode.test.dunit.rules.CacheRule.closeAndNullCache(CacheRule.java:214)
        at org.apache.geode.test.dunit.rules.CacheRule.after(CacheRule.java:143)
        at 
org.apache.geode.test.dunit.rules.AbstractDistributedRule.afterDistributedTest(AbstractDistributedRule.java:81)
        at 
org.apache.geode.test.dunit.rules.AbstractDistributedRule$1.evaluate(AbstractDistributedRule.java:61)
        at 
org.apache.geode.test.junit.rules.DescribedExternalResource$1.evaluate(DescribedExternalResource.java:40)
        at 
org.apache.geode.test.dunit.rules.AbstractDistributedRule$1.evaluate(AbstractDistributedRule.java:59)
        at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
        at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
        at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
        at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
        at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
        at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
        at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
        at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
        at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
        at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
        at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
        at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
        at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
        at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
        at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
        at 
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175)
        at 
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157)
        at 
org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
        at 
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
        at 
org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at 
org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
        at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
        - <0x00000000d0016ad0> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)
        - <0x00000000d10cfc00> (a 
java.util.concurrent.locks.ReentrantLock$NonfairSync)
{noformat}

And finally several "Asynchronous disk writer for region" threads showing some 
activity:
{noformat}
"Asynchronous disk writer for region 
IncrementalBackupDistributedTest_testMissingMemberInBaseline_diskStore-2" #184 
daemon prio=5 os_prio=0 tid=0x00007fa6760f
b000 nid=0x43b in Object.wait() [0x00007fa5ae817000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:460)
        at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
        at 
org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.waitUntilFlushIsReady(DiskStoreImpl.java:1691)
        - locked <0x00000000d0c1b648> (a java.lang.Object)
        at 
org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.doAsyncFlush(DiskStoreImpl.java:1728)
        at 
org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.run(DiskStoreImpl.java:1718)
        at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
        - None
{noformat}
{noformat}
"Asynchronous disk writer for region 
IncrementalBackupDistributedTest_testMissingMemberInBaseline_diskStore-1" #182 
daemon prio=5 os_prio=0 tid=0x00007fa6760f
9800 nid=0x439 in Object.wait() [0x00007fa5aec1b000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:460)
        at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
        at 
org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.waitUntilFlushIsReady(DiskStoreImpl.java:1691)
        - locked <0x00000000d0991050> (a java.lang.Object)
        at 
org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.doAsyncFlush(DiskStoreImpl.java:1728)
        at 
org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.run(DiskStoreImpl.java:1718)
        at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
        - None
{noformat}
{noformat}
"Asynchronous disk writer for region 
IncrementalBackupDistributedTest_testMissingMemberInBaseline_diskStore-2" #168 
daemon prio=5 os_prio=0 tid=0x00007f45fc52
8800 nid=0x3fa in Object.wait() [0x00007f4716fdb000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:460)
        at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
        at 
org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.waitUntilFlushIsReady(DiskStoreImpl.java:1691)
        - locked <0x00000000e0ce44e8> (a java.lang.Object)
        at 
org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.doAsyncFlush(DiskStoreImpl.java:1728)
        at 
org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.run(DiskStoreImpl.java:1718)
        at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
        - None
{noformat}
{noformat}
"Asynchronous disk writer for region 
IncrementalBackupDistributedTest_testMissingMemberInBaseline_diskStore-1" #166 
daemon prio=5 os_prio=0 tid=0x00007f45fc2ae800 nid=0x3f8 in Object.wait() 
[0x00007f471ede0000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:460)
        at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
        at 
org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.waitUntilFlushIsReady(DiskStoreImpl.java:1691)
        - locked <0x00000000e0c4b1b0> (a java.lang.Object)
        at 
org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.doAsyncFlush(DiskStoreImpl.java:1728)
        at 
org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.run(DiskStoreImpl.java:1718)
        at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
        - None
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to