[
https://issues.apache.org/jira/browse/GEODE-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kirk Lund reassigned GEODE-7411:
--------------------------------
Assignee: Mark Hanson
> IncrementalBackupDistributedTest.testMissingMemberInBaseline hangs in CI
> ------------------------------------------------------------------------
>
> Key: GEODE-7411
> URL: https://issues.apache.org/jira/browse/GEODE-7411
> Project: Geode
> Issue Type: Bug
> Components: core, persistence
> Reporter: Kirk Lund
> Assignee: Mark Hanson
> Priority: Major
>
> IncrementalBackupDistributedTest.testMissingMemberInBaseline hangs in CI. The
> test also has another bug filed against it for a suspect string (GEODE-7252).
> I analyzed the hang in
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/1274.
> dunit-hangs.txt contains:
> {noformat}
> Started @ 2019-11-05 01:18:22.452 +0000
> 2019-11-05 02:19:21.812 +0000
> org.apache.geode.internal.cache.backup.IncrementalBackupDistributedTest
> testMissingMemberInBaseline
> Ended @ 2019-11-05 03:19:58.983 +0000
> {noformat}
> The callstacks files show several BackupServiceThreads waiting on the
> CountDownLatch {{otherMembersReady}}.
> Several Asynchronous disk writer for region threads show some activity.
> There is also one thread testWorker thread hung in Cache close while waiting
> for DiskInitFile lock.
> BackupServiceThreads:
> {noformat}
> "BackupServiceThread1" #202 daemon prio=10 os_prio=0 tid=0x00007fa50c001000
> nid=0x473 waiting on condition [0x00007fa5acf1d000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000000d0bf62c0> (a
> java.util.concurrent.CountDownLatch$Sync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
> at
> org.apache.geode.internal.cache.backup.BackupTask.backup(BackupTask.java:72)
> at
> org.apache.geode.internal.cache.backup.BackupService.lambda$prepareBackup$0(BackupService.java:62)
> at
> org.apache.geode.internal.cache.backup.BackupService$$Lambda$268/1361475857.call(Unknown
> Source)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Locked ownable synchronizers:
> - <0x00000000d0bf63d8> (a
> java.util.concurrent.ThreadPoolExecutor$Worker)
> - <0x00000000d0c49960> (a
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> - <0x00000000d1084ef8> (a
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> {noformat}
> {noformat}
> "BackupServiceThread1" #193 daemon prio=5 os_prio=0 tid=0x00007f45fc82c000
> nid=0x46d waiting on condition [0x00007f4716bd7000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000000e0eb0a58> (a
> java.util.concurrent.CountDownLatch$Sync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
> at
> org.apache.geode.internal.cache.backup.BackupTask.backup(BackupTask.java:72)
> at
> org.apache.geode.internal.cache.backup.BackupService.lambda$prepareBackup$0(BackupService.java:62)
> at
> org.apache.geode.internal.cache.backup.BackupService$$Lambda$253/1051811564.call(Unknown
> Source)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Locked ownable synchronizers:
> - <0x00000000e0e7f410> (a
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> - <0x00000000e0e9e4f8> (a
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> - <0x00000000e0eb0958> (a
> java.util.concurrent.ThreadPoolExecutor$Worker)
> {noformat}
> {noformat}
> "BackupServiceThread1" #102 daemon prio=5 os_prio=0 tid=0x00007f45fc536000
> nid=0x35a waiting on condition [0x00007f471c9c2000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000000e0a07500> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Locked ownable synchronizers:
> - None
> {noformat}
> {noformat}
> "BackupServiceThread1" #81 daemon prio=10 os_prio=0 tid=0x00007faf38006800
> nid=0x359 waiting on condition [0x00007faf5f8f7000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000000e0a47f30> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Locked ownable synchronizers:
> - None
> {noformat}
> The "test worker" hung waiting on the DiskInitFile lock trying to close the
> cache:
> {noformat}
> "Test worker" #25 prio=5 os_prio=0 tid=0x00007fa800a1d000 nid=0x5e waiting on
> condition [0x00007fa6d5db3000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000000d0c49960> (a
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
> at
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
> at
> org.apache.geode.internal.cache.DiskInitFile.lock(DiskInitFile.java:389)
> at
> org.apache.geode.internal.cache.DiskInitFile.getOrCreateCanonicalId(DiskInitFile.java:2191)
> at
> org.apache.geode.internal.cache.Oplog.serializeRVVs(Oplog.java:6253)
> at
> org.apache.geode.internal.cache.Oplog.krfFileCreate(Oplog.java:3937)
> at org.apache.geode.internal.cache.Oplog.createKrf(Oplog.java:4077)
> - locked <0x00000000d0c49bf0> (a
> java.util.concurrent.atomic.AtomicBoolean)
> at org.apache.geode.internal.cache.Oplog.finishKrf(Oplog.java:6341)
> at
> org.apache.geode.internal.cache.Oplog.prepareForClose(Oplog.java:6346)
> at
> org.apache.geode.internal.cache.PersistentOplogSet.prepareForClose(PersistentOplogSet.java:980)
> at
> org.apache.geode.internal.cache.DiskStoreImpl.prepareForClose(DiskStoreImpl.java:2490)
> at
> org.apache.geode.internal.cache.GemFireCacheImpl.prepareDiskStoresForClose(GemFireCacheImpl.java:2432)
> at
> org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2173)
> - locked <0x00000000d09fdfa8> (a java.lang.Class for
> org.apache.geode.internal.cache.GemFireCacheImpl)
> at
> org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1911)
> at
> org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1901)
> at
> org.apache.geode.test.dunit.rules.CacheRule.closeCache(CacheRule.java:238)
> at
> org.apache.geode.test.dunit.rules.CacheRule.closeAndNullCache(CacheRule.java:214)
> at
> org.apache.geode.test.dunit.rules.CacheRule.after(CacheRule.java:143)
> at
> org.apache.geode.test.dunit.rules.AbstractDistributedRule.afterDistributedTest(AbstractDistributedRule.java:81)
> at
> org.apache.geode.test.dunit.rules.AbstractDistributedRule$1.evaluate(AbstractDistributedRule.java:61)
> at
> org.apache.geode.test.junit.rules.DescribedExternalResource$1.evaluate(DescribedExternalResource.java:40)
> at
> org.apache.geode.test.dunit.rules.AbstractDistributedRule$1.evaluate(AbstractDistributedRule.java:59)
> at
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
> at
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
> at
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
> at
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
> at
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
> at
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
> at
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
> at
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
> at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
> at
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
> at
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
> at
> org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175)
> at
> org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157)
> at
> org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
> at
> org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
> at
> org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at
> org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
> at java.lang.Thread.run(Thread.java:748)
> Locked ownable synchronizers:
> - <0x00000000d0016ad0> (a
> java.util.concurrent.ThreadPoolExecutor$Worker)
> - <0x00000000d10cfc00> (a
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> {noformat}
> And finally several "Asynchronous disk writer for region" threads showing
> some activity:
> {noformat}
> "Asynchronous disk writer for region
> IncrementalBackupDistributedTest_testMissingMemberInBaseline_diskStore-2"
> #184 daemon prio=5 os_prio=0 tid=0x00007fa6760f
> b000 nid=0x43b in Object.wait() [0x00007fa5ae817000]
> java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:460)
> at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
> at
> org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.waitUntilFlushIsReady(DiskStoreImpl.java:1691)
> - locked <0x00000000d0c1b648> (a java.lang.Object)
> at
> org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.doAsyncFlush(DiskStoreImpl.java:1728)
> at
> org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.run(DiskStoreImpl.java:1718)
> at java.lang.Thread.run(Thread.java:748)
> Locked ownable synchronizers:
> - None
> {noformat}
> {noformat}
> "Asynchronous disk writer for region
> IncrementalBackupDistributedTest_testMissingMemberInBaseline_diskStore-1"
> #182 daemon prio=5 os_prio=0 tid=0x00007fa6760f
> 9800 nid=0x439 in Object.wait() [0x00007fa5aec1b000]
> java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:460)
> at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
> at
> org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.waitUntilFlushIsReady(DiskStoreImpl.java:1691)
> - locked <0x00000000d0991050> (a java.lang.Object)
> at
> org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.doAsyncFlush(DiskStoreImpl.java:1728)
> at
> org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.run(DiskStoreImpl.java:1718)
> at java.lang.Thread.run(Thread.java:748)
> Locked ownable synchronizers:
> - None
> {noformat}
> {noformat}
> "Asynchronous disk writer for region
> IncrementalBackupDistributedTest_testMissingMemberInBaseline_diskStore-2"
> #168 daemon prio=5 os_prio=0 tid=0x00007f45fc52
> 8800 nid=0x3fa in Object.wait() [0x00007f4716fdb000]
> java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:460)
> at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
> at
> org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.waitUntilFlushIsReady(DiskStoreImpl.java:1691)
> - locked <0x00000000e0ce44e8> (a java.lang.Object)
> at
> org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.doAsyncFlush(DiskStoreImpl.java:1728)
> at
> org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.run(DiskStoreImpl.java:1718)
> at java.lang.Thread.run(Thread.java:748)
> Locked ownable synchronizers:
> - None
> {noformat}
> {noformat}
> "Asynchronous disk writer for region
> IncrementalBackupDistributedTest_testMissingMemberInBaseline_diskStore-1"
> #166 daemon prio=5 os_prio=0 tid=0x00007f45fc2ae800 nid=0x3f8 in
> Object.wait() [0x00007f471ede0000]
> java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:460)
> at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
> at
> org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.waitUntilFlushIsReady(DiskStoreImpl.java:1691)
> - locked <0x00000000e0c4b1b0> (a java.lang.Object)
> at
> org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.doAsyncFlush(DiskStoreImpl.java:1728)
> at
> org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.run(DiskStoreImpl.java:1718)
> at java.lang.Thread.run(Thread.java:748)
> Locked ownable synchronizers:
> - None
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)