[ 
https://issues.apache.org/jira/browse/GEODE-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk Lund reassigned GEODE-7411:
--------------------------------

    Assignee: Mark Hanson

> IncrementalBackupDistributedTest.testMissingMemberInBaseline hangs in CI
> ------------------------------------------------------------------------
>
>                 Key: GEODE-7411
>                 URL: https://issues.apache.org/jira/browse/GEODE-7411
>             Project: Geode
>          Issue Type: Bug
>          Components: core, persistence
>            Reporter: Kirk Lund
>            Assignee: Mark Hanson
>            Priority: Major
>
> IncrementalBackupDistributedTest.testMissingMemberInBaseline hangs in CI. The 
> test also has another bug filed against it for a suspect string (GEODE-7252).
> I analyzed the hang in 
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/1274.
> dunit-hangs.txt contains:
> {noformat}
> Started @ 2019-11-05 01:18:22.452 +0000
> 2019-11-05 02:19:21.812 +0000  
> org.apache.geode.internal.cache.backup.IncrementalBackupDistributedTest 
> testMissingMemberInBaseline
> Ended @ 2019-11-05 03:19:58.983 +0000
> {noformat}
> The callstacks files show several BackupServiceThreads waiting on the 
> CountDownLatch {{otherMembersReady}}.
> Several Asynchronous disk writer for region threads show some activity.
> There is also one thread testWorker thread hung in Cache close while waiting 
> for DiskInitFile lock.
> BackupServiceThreads:
> {noformat}
> "BackupServiceThread1" #202 daemon prio=10 os_prio=0 tid=0x00007fa50c001000 
> nid=0x473 waiting on condition [0x00007fa5acf1d000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000000d0bf62c0> (a 
> java.util.concurrent.CountDownLatch$Sync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>         at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>         at 
> org.apache.geode.internal.cache.backup.BackupTask.backup(BackupTask.java:72)
>         at 
> org.apache.geode.internal.cache.backup.BackupService.lambda$prepareBackup$0(BackupService.java:62)
>         at 
> org.apache.geode.internal.cache.backup.BackupService$$Lambda$268/1361475857.call(Unknown
>  Source)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
>         - <0x00000000d0bf63d8> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
>         - <0x00000000d0c49960> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>         - <0x00000000d1084ef8> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> {noformat}
> {noformat}
> "BackupServiceThread1" #193 daemon prio=5 os_prio=0 tid=0x00007f45fc82c000 
> nid=0x46d waiting on condition [0x00007f4716bd7000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000000e0eb0a58> (a 
> java.util.concurrent.CountDownLatch$Sync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>         at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>         at 
> org.apache.geode.internal.cache.backup.BackupTask.backup(BackupTask.java:72)
>         at 
> org.apache.geode.internal.cache.backup.BackupService.lambda$prepareBackup$0(BackupService.java:62)
>         at 
> org.apache.geode.internal.cache.backup.BackupService$$Lambda$253/1051811564.call(Unknown
>  Source)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
>         - <0x00000000e0e7f410> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>         - <0x00000000e0e9e4f8> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>         - <0x00000000e0eb0958> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> {noformat}
> {noformat}
> "BackupServiceThread1" #102 daemon prio=5 os_prio=0 tid=0x00007f45fc536000 
> nid=0x35a waiting on condition [0x00007f471c9c2000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000000e0a07500> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>         at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>         at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
>         - None
> {noformat}
> {noformat}
> "BackupServiceThread1" #81 daemon prio=10 os_prio=0 tid=0x00007faf38006800 
> nid=0x359 waiting on condition [0x00007faf5f8f7000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000000e0a47f30> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>         at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>         at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
>         - None
> {noformat}
> The "test worker" hung waiting on the DiskInitFile lock trying to close the 
> cache:
> {noformat}
> "Test worker" #25 prio=5 os_prio=0 tid=0x00007fa800a1d000 nid=0x5e waiting on 
> condition [0x00007fa6d5db3000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000000d0c49960> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>         at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>         at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>         at 
> org.apache.geode.internal.cache.DiskInitFile.lock(DiskInitFile.java:389)
>         at 
> org.apache.geode.internal.cache.DiskInitFile.getOrCreateCanonicalId(DiskInitFile.java:2191)
>         at 
> org.apache.geode.internal.cache.Oplog.serializeRVVs(Oplog.java:6253)
>         at 
> org.apache.geode.internal.cache.Oplog.krfFileCreate(Oplog.java:3937)
>         at org.apache.geode.internal.cache.Oplog.createKrf(Oplog.java:4077)
>         - locked <0x00000000d0c49bf0> (a 
> java.util.concurrent.atomic.AtomicBoolean)
>         at org.apache.geode.internal.cache.Oplog.finishKrf(Oplog.java:6341)
>         at 
> org.apache.geode.internal.cache.Oplog.prepareForClose(Oplog.java:6346)
>         at 
> org.apache.geode.internal.cache.PersistentOplogSet.prepareForClose(PersistentOplogSet.java:980)
>         at 
> org.apache.geode.internal.cache.DiskStoreImpl.prepareForClose(DiskStoreImpl.java:2490)
>         at 
> org.apache.geode.internal.cache.GemFireCacheImpl.prepareDiskStoresForClose(GemFireCacheImpl.java:2432)
>         at 
> org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2173)
>         - locked <0x00000000d09fdfa8> (a java.lang.Class for 
> org.apache.geode.internal.cache.GemFireCacheImpl)
>         at 
> org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1911)
>         at 
> org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1901)
>         at 
> org.apache.geode.test.dunit.rules.CacheRule.closeCache(CacheRule.java:238)
>         at 
> org.apache.geode.test.dunit.rules.CacheRule.closeAndNullCache(CacheRule.java:214)
>         at 
> org.apache.geode.test.dunit.rules.CacheRule.after(CacheRule.java:143)
>         at 
> org.apache.geode.test.dunit.rules.AbstractDistributedRule.afterDistributedTest(AbstractDistributedRule.java:81)
>         at 
> org.apache.geode.test.dunit.rules.AbstractDistributedRule$1.evaluate(AbstractDistributedRule.java:61)
>         at 
> org.apache.geode.test.junit.rules.DescribedExternalResource$1.evaluate(DescribedExternalResource.java:40)
>         at 
> org.apache.geode.test.dunit.rules.AbstractDistributedRule$1.evaluate(AbstractDistributedRule.java:59)
>         at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
>         at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>         at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>         at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>         at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>         at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>         at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>         at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>         at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>         at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>         at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>         at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>         at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>         at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>         at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>         at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>         at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>         at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>         at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>         at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
>         at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>         at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>         at 
> org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175)
>         at 
> org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157)
>         at 
> org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
>         at 
> org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
>         at 
> org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at 
> org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
>         at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
>         - <0x00000000d0016ad0> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
>         - <0x00000000d10cfc00> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> {noformat}
> And finally several "Asynchronous disk writer for region" threads showing 
> some activity:
> {noformat}
> "Asynchronous disk writer for region 
> IncrementalBackupDistributedTest_testMissingMemberInBaseline_diskStore-2" 
> #184 daemon prio=5 os_prio=0 tid=0x00007fa6760f
> b000 nid=0x43b in Object.wait() [0x00007fa5ae817000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         at java.lang.Object.wait(Object.java:460)
>         at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
>         at 
> org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.waitUntilFlushIsReady(DiskStoreImpl.java:1691)
>         - locked <0x00000000d0c1b648> (a java.lang.Object)
>         at 
> org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.doAsyncFlush(DiskStoreImpl.java:1728)
>         at 
> org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.run(DiskStoreImpl.java:1718)
>         at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
>         - None
> {noformat}
> {noformat}
> "Asynchronous disk writer for region 
> IncrementalBackupDistributedTest_testMissingMemberInBaseline_diskStore-1" 
> #182 daemon prio=5 os_prio=0 tid=0x00007fa6760f
> 9800 nid=0x439 in Object.wait() [0x00007fa5aec1b000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         at java.lang.Object.wait(Object.java:460)
>         at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
>         at 
> org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.waitUntilFlushIsReady(DiskStoreImpl.java:1691)
>         - locked <0x00000000d0991050> (a java.lang.Object)
>         at 
> org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.doAsyncFlush(DiskStoreImpl.java:1728)
>         at 
> org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.run(DiskStoreImpl.java:1718)
>         at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
>         - None
> {noformat}
> {noformat}
> "Asynchronous disk writer for region 
> IncrementalBackupDistributedTest_testMissingMemberInBaseline_diskStore-2" 
> #168 daemon prio=5 os_prio=0 tid=0x00007f45fc52
> 8800 nid=0x3fa in Object.wait() [0x00007f4716fdb000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         at java.lang.Object.wait(Object.java:460)
>         at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
>         at 
> org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.waitUntilFlushIsReady(DiskStoreImpl.java:1691)
>         - locked <0x00000000e0ce44e8> (a java.lang.Object)
>         at 
> org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.doAsyncFlush(DiskStoreImpl.java:1728)
>         at 
> org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.run(DiskStoreImpl.java:1718)
>         at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
>         - None
> {noformat}
> {noformat}
> "Asynchronous disk writer for region 
> IncrementalBackupDistributedTest_testMissingMemberInBaseline_diskStore-1" 
> #166 daemon prio=5 os_prio=0 tid=0x00007f45fc2ae800 nid=0x3f8 in 
> Object.wait() [0x00007f471ede0000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         at java.lang.Object.wait(Object.java:460)
>         at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
>         at 
> org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.waitUntilFlushIsReady(DiskStoreImpl.java:1691)
>         - locked <0x00000000e0c4b1b0> (a java.lang.Object)
>         at 
> org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.doAsyncFlush(DiskStoreImpl.java:1728)
>         at 
> org.apache.geode.internal.cache.DiskStoreImpl$FlusherThread.run(DiskStoreImpl.java:1718)
>         at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
>         - None
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to