[ 
https://issues.apache.org/jira/browse/GEODE-6823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce Schuchardt resolved GEODE-6823.
-------------------------------------
       Resolution: Fixed
    Fix Version/s: 1.10.0

> Hang in ElderInitProcessor.init()
> ---------------------------------
>
>                 Key: GEODE-6823
>                 URL: https://issues.apache.org/jira/browse/GEODE-6823
>             Project: Geode
>          Issue Type: Bug
>          Components: distributed lock service
>    Affects Versions: 1.8.0, 1.9.0
>            Reporter: Bruce Schuchardt
>            Assignee: Bruce Schuchardt
>            Priority: Major
>             Fix For: 1.10.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> A locator and a server were spinning up at the same time and the locator 
> became stuck trying to initialize a distributed lock service.  Extra logging 
> showed that the server received an ElderInitMessage that it decided to ignore 
> because it thought it was shutting down.
>  
> {noformat}
> gemfire2_2430/system.log: [info 2019/05/29 11:00:34.230 PDT 
> <vm_2_thr_40_client2_host1_2430> tid=0x24] Initial (distribution manager) 
> view, 
> View[rs-GEM-2316-0906a2i3large-hydra-client-11(2354:locator)<ec><v0>:41000|2] 
> members: 
> [rs-GEM-2316-0906a2i3large-hydra-client-11(2354:locator)<ec><v0>:41000, 
> rs-GEM-2316-0906a2i3large-hydra-client-11(gemfire1_host1_2416:2416)<ec><v1>:41001{lead},
>  
> rs-GEM-2316-0906a2i3large-hydra-client-11(gemfire1_host1_2420:2420)<ec><v2>:41002,
>  
> rs-GEM-2316-0906a2i3large-hydra-client-11(gemfire2_host1_2430:2430)<ec><v2>:41003]
> gemfire2_2430/system.log: [debug 2019/05/29 11:00:34.463 PDT <unicast 
> receiver,rs-GEM-2316-0906a2i3large-hydra-client-11-10705> tid=0x46] Received 
> message 'ElderInitMessage (processorId='1)' from 
> <rs-GEM-2316-0906a2i3large-hydra-client-11(2354:locator)<ec><v0>:41000>
> gemfire2_2430/system.log: [debug 2019/05/29 11:00:34.574 PDT <Pooled Message 
> Processor 2> tid=0x4d] Waiting for Elder to change. Expecting Elder to be 
> rs-GEM-2316-0906a2i3large-hydra-client-11(2354:locator)<ec><v0>:41000, is 
> rs-GEM-2316-0906a2i3large-hydra-client-11(2354:locator)<ec><v0>:41000.
> gemfire2_2430/system.log: [info 2019/05/29 11:00:34.575 PDT <Pooled Message 
> Processor 2> tid=0x4d] ElderInitMessage (processorId='1): disregarding 
> request from departed member.
> gemfire2_2430/system.log: [info 2019/05/29 11:00:35.238 PDT <unicast 
> receiver,rs-GEM-2316-0906a2i3large-hydra-client-11-10705> tid=0x46] received 
> new view: 
> View[rs-GEM-2316-0906a2i3large-hydra-client-11(2354:locator)<ec><v0>:41000|3] 
> members: 
> [rs-GEM-2316-0906a2i3large-hydra-client-11(2354:locator)<ec><v0>:41000, 
> rs-GEM-2316-0906a2i3large-hydra-client-11(gemfire1_host1_2416:2416)<ec><v1>:41001{lead},
>  
> rs-GEM-2316-0906a2i3large-hydra-client-11(gemfire1_host1_2420:2420)<ec><v2>:41002,
>  
> rs-GEM-2316-0906a2i3large-hydra-client-11(gemfire2_host1_2430:2430)<ec><v2>:41003,
>  
> rs-GEM-2316-0906a2i3large-hydra-client-11(gemfire2_host1_2437:2437)<ec><v3>:41004]
> locator_ds_2354/system.log: [warn 2019/05/29 11:00:50.430 PDT <Pooled Message 
> Processor 2> tid=0x38] 15 seconds have elapsed while waiting for replies: 
> <ElderInitProcessor 1 waiting for 1 replies from 
> [rs-GEM-2316-0906a2i3large-hydra-client-11(gemfire2_host1_2430:2430)<ec><v2>:41003]>
>  on rs-GEM-2316-0906a2i3large-hydra-client-11(2354:locator)<ec><v0>:41000 
> whose current membership list is: 
> [[rs-GEM-2316-0906a2i3large-hydra-client-11(gemfire2_host1_2437:2437)<ec><v3>:41004,
>  
> rs-GEM-2316-0906a2i3large-hydra-client-11(gemfire2_host1_2430:2430)<ec><v2>:41003,
>  rs-GEM-2316-0906a2i3large-hydra-client-11(2354:locator)<ec><v0>:41000, 
> rs-GEM-2316-0906a2i3large-hydra-client-11(gemfire1_host1_2420:2420)<ec><v2>:41002,
>  
> rs-GEM-2316-0906a2i3large-hydra-client-11(gemfire1_host1_2416:2416)<ec><v1>:41001]]
> [Stack #1 from bgexec15197_2354.log line 2]
> "Pooled Message Processor 2" #56 daemon prio=5 os_prio=0 
> tid=0x000000000194e800 nid=0xae3 waiting on condition [0x00007f5c94dce000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0x0000000775ff4f08> (a 
> java.util.concurrent.CountDownLatch$Sync)
>       at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>       at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>       at 
> org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:71)
>       at 
> org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:716)
>       at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:787)
>       at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:764)
>       at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:850)
>       at 
> org.apache.geode.distributed.internal.locks.ElderInitProcessor.init(ElderInitProcessor.java:69)
>       at 
> org.apache.geode.distributed.internal.locks.ElderState.<init>(ElderState.java:53)
>       at 
> org.apache.geode.distributed.internal.ClusterElderManager.lambda$new$0(ClusterElderManager.java:41)
>       at 
> org.apache.geode.distributed.internal.ClusterElderManager$$Lambda$64/1182435120.get(Unknown
>  Source)
>       at 
> org.apache.geode.distributed.internal.ClusterElderManager.initializeElderState(ClusterElderManager.java:107)
>       at 
> org.apache.geode.distributed.internal.ClusterElderManager.getElderState(ClusterElderManager.java:98)
>       at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.getElderState(ClusterDistributionManager.java:2885)
>       at 
> org.apache.geode.distributed.internal.locks.GrantorRequestProcessor$GrantorRequestMessage.basicProcess(GrantorRequestProcessor.java:494)
>       at 
> org.apache.geode.distributed.internal.locks.GrantorRequestProcessor$GrantorRequestMessage.process(GrantorRequestProcessor.java:489)
>       at 
> org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:369)
>       at 
> org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:435)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:959)
>       at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.doProcessingThread(ClusterDistributionManager.java:825)
>       at 
> org.apache.geode.distributed.internal.ClusterDistributionManager$$Lambda$77/482261794.invoke(Unknown
>  Source)
>       at 
> org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121)
>       at 
> org.apache.geode.internal.logging.LoggingThreadFactory$$Lambda$78/2015475584.run(Unknown
>  Source)
>       at java.lang.Thread.run(Thread.java:748)
> so 2430 did not respond to the ElderInitMessage.  That would happen if 2430 
> was shutting down, if 2354 was no longer a member or if 2430 became the elder.
> {noformat}
> I added more logging to ClusterElderManager.waitForElder(), which is 
> returning false and causing this "disregarding" message to be logged. What I 
> see is that the server doing this thinks it's shutting down but it most 
> definitely is not:
> {noformat}
> [debug 2019/05/29 15:58:48.488 PDT <unicast 
> receiver,rs-GEM-2316-1539a0i3large-hydra-client-4-39597> tid=0x46] Received 
> message 'ElderInitMessage (processorId=1)' from 
> <rs-GEM-2316-1539a0i3large-hydra-client-4(15224:locator)<ec><v0>:41000>
> [debug 2019/05/29 15:58:48.510 PDT <Pooled Message Processor 2> tid=0x4d] 
> Wait for elder terminated because we're shutting down
> [info 2019/05/29 15:58:48.511 PDT <Pooled Message Processor 2> tid=0x4d] 
> ElderInitMessage (processorId=1): disregarding request from departed member.
> [debug 2019/05/29 15:58:48.599 PDT <unicast 
> receiver,rs-GEM-2316-1539a0i3large-hydra-client-4-39597> tid=0x46] processing 
> InstallViewMessage(type=PREPARE; Current ViewID=3; Previous View ID=0; 
> View[rs-GEM-2316-1539a0i3large-hydra-client-4(15224:locator)<ec><v0>:41000|3] 
> members: 
> [rs-GEM-2316-1539a0i3large-hydra-client-4(15224:locator)<ec><v0>:41000, 
> rs-GEM-2316-1539a0i3large-hydra-client-4(gemfire1_host1_15283:15283)<ec><v1>:41001{lead},
>  
> rs-GEM-2316-1539a0i3large-hydra-client-4(gemfire1_host1_15290:15290)<ec><v2>:41002,
>  
> rs-GEM-2316-1539a0i3large-hydra-client-4(gemfire2_host1_15295:15295)<ec><v3>:41004,
>  
> rs-GEM-2316-1539a0i3large-hydra-client-4(gemfire2_host1_15304:15304)<ec><v3>:41003];
>  cred=null){noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to