Kirk Lund created GEODE-7411:
--------------------------------
Summary:
IncrementalBackupDistributedTest.testMissingMemberInBaseline hangs in CI
Key: GEODE-7411
URL: https://issues.apache.org/jira/browse/GEODE-7411
Project: Geode
Issue Type: Bug
Components: core, persistence
Reporter: Kirk Lund
IncrementalBackupDistributedTest.testMissingMemberInBaseline hangs in CI. The
test also has another bug filed against it for a suspect string (GEODE-7252).
I analyzed the hang in
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/1274.
dunit-hangs.txt contains:
{noformat}
Started @ 2019-11-05 01:18:22.452 +0000
2019-11-05 02:19:21.812 +0000
org.apache.geode.internal.cache.backup.IncrementalBackupDistributedTest
testMissingMemberInBaseline
Ended @ 2019-11-05 03:19:58.983 +0000
{noformat}
The callstacks files show several BackupServiceThreads waiting on the
CountDownLatch {{otherMembersReady}}.
Several Asynchronous disk writer for region threads show some activity.
There is also one thread testWorker thread hung in Cache close while waiting
for DiskInitFile lock.
BackupServiceThreads:
{noformat}
"BackupServiceThread1" #202 daemon prio=10 os_prio=0 tid=0x00007fa50c001000
nid=0x473 waiting on condition [0x00007fa5acf1d000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000d0bf62c0> (a
java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at
org.apache.geode.internal.cache.backup.BackupTask.backup(BackupTask.java:72)
at
org.apache.geode.internal.cache.backup.BackupService.lambda$prepareBackup$0(BackupService.java:62)
at
org.apache.geode.internal.cache.backup.BackupService$$Lambda$268/1361475857.call(Unknown
Source)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- <0x00000000d0bf63d8> (a
java.util.concurrent.ThreadPoolExecutor$Worker)
- <0x00000000d0c49960> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
- <0x00000000d1084ef8> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
{noformat}
{noformat}
"BackupServiceThread1" #193 daemon prio=5 os_prio=0 tid=0x00007f45fc82c000
nid=0x46d waiting on condition [0x00007f4716bd7000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e0eb0a58> (a
java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at
org.apache.geode.internal.cache.backup.BackupTask.backup(BackupTask.java:72)
at
org.apache.geode.internal.cache.backup.BackupService.lambda$prepareBackup$0(BackupService.java:62)
at
org.apache.geode.internal.cache.backup.BackupService$$Lambda$253/1051811564.call(Unknown
Source)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- <0x00000000e0e7f410> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
- <0x00000000e0e9e4f8> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
- <0x00000000e0eb0958> (a
java.util.concurrent.ThreadPoolExecutor$Worker)
{noformat}
{noformat}
"BackupServiceThread1" #102 daemon prio=5 os_prio=0 tid=0x00007f45fc536000
nid=0x35a waiting on condition [0x00007f471c9c2000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e0a07500> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
{noformat}
{noformat}
"BackupServiceThread1" #81 daemon prio=10 os_prio=0 tid=0x00007faf38006800
nid=0x359 waiting on condition [0x00007faf5f8f7000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e0a47f30> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
{noformat}
The "test worker" hung waiting on the DiskInitFile lock trying to close the
cache:
{noformat}
"Test worker" #25 prio=5 os_prio=0 tid=0x00007fa800a1d000 nid=0x5e waiting on
condition [0x00007fa6d5db3000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000d0c49960> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at
org.apache.geode.internal.cache.DiskInitFile.lock(DiskInitFile.java:389)
at
org.apache.geode.internal.cache.DiskInitFile.getOrCreateCanonicalId(DiskInitFile.java:2191)
at org.apache.geode.internal.cache.Oplog.serializeRVVs(Oplog.java:6253)
at org.apache.geode.internal.cache.Oplog.krfFileCreate(Oplog.java:3937)
at org.apache.geode.internal.cache.Oplog.createKrf(Oplog.java:4077)
- locked <0x00000000d0c49bf0> (a
java.util.concurrent.atomic.AtomicBoolean)
at org.apache.geode.internal.cache.Oplog.finishKrf(Oplog.java:6341)
at
org.apache.geode.internal.cache.Oplog.prepareForClose(Oplog.java:6346)
at
org.apache.geode.internal.cache.PersistentOplogSet.prepareForClose(PersistentOplogSet.java:980)
at
org.apache.geode.internal.cache.DiskStoreImpl.prepareForClose(DiskStoreImpl.java:2490)
at
org.apache.geode.internal.cache.GemFireCacheImpl.prepareDiskStoresForClose(GemFireCacheImpl.java:2432)
at
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2173)
- locked <0x00000000d09fdfa8> (a java.lang.Class for
org.apache.geode.internal.cache.GemFireCacheImpl)
at
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1911)
at
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1901)
at
org.apache.geode.test.dunit.rules.CacheRule.closeCache(CacheRule.java:238)
at
org.apache.geode.test.dunit.rules.CacheRule.closeAndNullCache(CacheRule.java:214)
at org.apache.geode.test.dunit.rules.CacheRule.after(CacheRule.java:143)
at
org.apache.geode.test.dunit.rules.AbstractDistributedRule.afterDistributedTest(AbstractDistributedRule.java:81)
at
org.apache.geode.test.dunit.rules.AbstractDistributedRule$1.evaluate(AbstractDistributedRule.java:61)
at
org.apache.geode.test.junit.rules.DescribedExternalResource$1.evaluate(DescribedExternalResource.java:40)
at
org.apache.geode.test.dunit.rules.AbstractDistributedRule$1.evaluate(AbstractDistributedRule.java:59)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
at
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175)
at
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157)
at
org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
at
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
at
org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at
org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- <0x00000000d0016ad0> (a
java.util.concurrent.ThreadPoolExecutor$Worker)
- <0x00000000d10cfc00> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
{noformat}
And finally several "Asynchronous disk writer for region" threads showing some
activity:
{noformat}
"Asynchronous disk writer for region
IncrementalBackupDistributedTest_testMissingMemberInBaseline_diskStore-2" #184
daemon prio=5 os_prio=0 tid=0x00007fa6760f
b000 nid=0x43b in Object.wait() [0x00007fa5ae817000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:460)
at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
at
org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.waitUntilFlushIsReady(DiskStoreImpl.java:1691)
- locked <0x00000000d0c1b648> (a java.lang.Object)
at
org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.doAsyncFlush(DiskStoreImpl.java:1728)
at
org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.run(DiskStoreImpl.java:1718)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
{noformat}
{noformat}
"Asynchronous disk writer for region
IncrementalBackupDistributedTest_testMissingMemberInBaseline_diskStore-1" #182
daemon prio=5 os_prio=0 tid=0x00007fa6760f
9800 nid=0x439 in Object.wait() [0x00007fa5aec1b000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:460)
at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
at
org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.waitUntilFlushIsReady(DiskStoreImpl.java:1691)
- locked <0x00000000d0991050> (a java.lang.Object)
at
org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.doAsyncFlush(DiskStoreImpl.java:1728)
at
org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.run(DiskStoreImpl.java:1718)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
{noformat}
{noformat}
"Asynchronous disk writer for region
IncrementalBackupDistributedTest_testMissingMemberInBaseline_diskStore-2" #168
daemon prio=5 os_prio=0 tid=0x00007f45fc52
8800 nid=0x3fa in Object.wait() [0x00007f4716fdb000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:460)
at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
at
org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.waitUntilFlushIsReady(DiskStoreImpl.java:1691)
- locked <0x00000000e0ce44e8> (a java.lang.Object)
at
org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.doAsyncFlush(DiskStoreImpl.java:1728)
at
org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.run(DiskStoreImpl.java:1718)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
{noformat}
{noformat}
"Asynchronous disk writer for region
IncrementalBackupDistributedTest_testMissingMemberInBaseline_diskStore-1" #166
daemon prio=5 os_prio=0 tid=0x00007f45fc2ae800 nid=0x3f8 in Object.wait()
[0x00007f471ede0000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:460)
at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
at
org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.waitUntilFlushIsReady(DiskStoreImpl.java:1691)
- locked <0x00000000e0c4b1b0> (a java.lang.Object)
at
org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.doAsyncFlush(DiskStoreImpl.java:1728)
at
org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.run(DiskStoreImpl.java:1718)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
{noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)