Bruce J Schuchardt created GEODE-8474:
-----------------------------------------

             Summary: Hang during restart in PersistentBucketRecoverer due to 
use of CountDownLatch
                 Key: GEODE-8474
                 URL: https://issues.apache.org/jira/browse/GEODE-8474
             Project: Geode
          Issue Type: Bug
          Components: persistence
    Affects Versions: 1.0.0-incubating
            Reporter: Bruce J Schuchardt


In the test described in GEODE-8467 we saw a node that experienced a Forced 
Disconnect during startup hang with threads waiting for other members to come 
up.  These threads did not pay attention to the Forced Disconnect because they 
are using a CountDownLatch rather than a StoppableCountDownLatch.  The 
stoppable form of the latch pays attention to shutdown conditions and 
interrupts an "await()" on the latch.

As part of fixing this ticket we should scan for other uses of CountDownLatch 
and consider replacing those with StoppableCountDownLatch.

 
{noformat}
"vm_3_thr_5_dataStore10_host1_16547" #19 daemon prio=5 os_prio=0 
tid=0x00007f451c003800 nid=0x418f waiting on condition [0x00007f4540bb5000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000e50ebc70> (a 
java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
        at 
org.apache.geode.internal.cache.partitioned.PersistentBucketRecoverer.await(PersistentBucketRecoverer.java:447)
        at 
org.apache.geode.internal.cache.PRHARedundancyProvider.waitForPersistentBucketRecovery(PRHARedundancyProvider.java:1877)
        at 
org.apache.geode.internal.cache.PartitionedRegion.cleanupFailedInitialization(PartitionedRegion.java:5649)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3011)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2891)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.createRegion(GemFireCacheImpl.java:2875)
        at hydra.RegionHelper.createRegion(RegionHelper.java:106)
        - locked <0x00000000e39e3d50> (a java.lang.Class for hydra.RegionHelper)
        at hydra.RegionHelper.createRegion(RegionHelper.java:76)
        - locked <0x00000000e39e3d50> (a java.lang.Class for hydra.RegionHelper)
        at hydra.RegionHelper.createRegion(RegionHelper.java:65)
        - locked <0x00000000e39e3d50> (a java.lang.Class for hydra.RegionHelper)
        at hydra.RegionHelper.createRegion(RegionHelper.java:47)
        - locked <0x00000000e39e3d50> (a java.lang.Class for hydra.RegionHelper)
        at 
management.test.cli.CommandTestVersionHelper.createRegions(CommandTestVersionHelper.java:70)
        at 
management.test.cli.CommandTest.HydraTask_initializeRegions(CommandTest.java:335)
        - locked <0x00000000e1216488> (a java.lang.Class for 
management.test.cli.CommandTest)
        at 
management.test.cli.CommandTest.HydraTask_configurableInit(CommandTest.java:259)
        - locked <0x00000000e1216488> (a java.lang.Class for 
management.test.cli.CommandTest)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at hydra.MethExecutor.execute(MethExecutor.java:173)
        at hydra.MethExecutor.execute(MethExecutor.java:141)
        at hydra.TestTask.execute(TestTask.java:197)
        at hydra.RemoteTestModule$1.run(RemoteTestModule.java:213) {noformat}
 

This problem is in all versions of Geode to-date (v1.14), so it probably 
doesn't need to be backported, however it does cause a Hang during restart.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to