Bruce J Schuchardt created GEODE-8474:
-----------------------------------------
Summary: Hang during restart in PersistentBucketRecoverer due to
use of CountDownLatch
Key: GEODE-8474
URL: https://issues.apache.org/jira/browse/GEODE-8474
Project: Geode
Issue Type: Bug
Components: persistence
Affects Versions: 1.0.0-incubating
Reporter: Bruce J Schuchardt
In the test described in GEODE-8467 we saw a node that experienced a Forced
Disconnect during startup hang with threads waiting for other members to come
up. These threads did not pay attention to the Forced Disconnect because they
are using a CountDownLatch rather than a StoppableCountDownLatch. The
stoppable form of the latch pays attention to shutdown conditions and
interrupts an "await()" on the latch.
As part of fixing this ticket we should scan for other uses of CountDownLatch
and consider replacing those with StoppableCountDownLatch.
{noformat}
"vm_3_thr_5_dataStore10_host1_16547" #19 daemon prio=5 os_prio=0
tid=0x00007f451c003800 nid=0x418f waiting on condition [0x00007f4540bb5000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e50ebc70> (a
java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at
org.apache.geode.internal.cache.partitioned.PersistentBucketRecoverer.await(PersistentBucketRecoverer.java:447)
at
org.apache.geode.internal.cache.PRHARedundancyProvider.waitForPersistentBucketRecovery(PRHARedundancyProvider.java:1877)
at
org.apache.geode.internal.cache.PartitionedRegion.cleanupFailedInitialization(PartitionedRegion.java:5649)
at
org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3011)
at
org.apache.geode.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2891)
at
org.apache.geode.internal.cache.GemFireCacheImpl.createRegion(GemFireCacheImpl.java:2875)
at hydra.RegionHelper.createRegion(RegionHelper.java:106)
- locked <0x00000000e39e3d50> (a java.lang.Class for hydra.RegionHelper)
at hydra.RegionHelper.createRegion(RegionHelper.java:76)
- locked <0x00000000e39e3d50> (a java.lang.Class for hydra.RegionHelper)
at hydra.RegionHelper.createRegion(RegionHelper.java:65)
- locked <0x00000000e39e3d50> (a java.lang.Class for hydra.RegionHelper)
at hydra.RegionHelper.createRegion(RegionHelper.java:47)
- locked <0x00000000e39e3d50> (a java.lang.Class for hydra.RegionHelper)
at
management.test.cli.CommandTestVersionHelper.createRegions(CommandTestVersionHelper.java:70)
at
management.test.cli.CommandTest.HydraTask_initializeRegions(CommandTest.java:335)
- locked <0x00000000e1216488> (a java.lang.Class for
management.test.cli.CommandTest)
at
management.test.cli.CommandTest.HydraTask_configurableInit(CommandTest.java:259)
- locked <0x00000000e1216488> (a java.lang.Class for
management.test.cli.CommandTest)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at hydra.MethExecutor.execute(MethExecutor.java:173)
at hydra.MethExecutor.execute(MethExecutor.java:141)
at hydra.TestTask.execute(TestTask.java:197)
at hydra.RemoteTestModule$1.run(RemoteTestModule.java:213) {noformat}
This problem is in all versions of Geode to-date (v1.14), so it probably
doesn't need to be backported, however it does cause a Hang during restart.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)