[
https://issues.apache.org/jira/browse/GEODE-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Owen Nichols closed GEODE-6570.
-------------------------------
> processing of cached join request delays view installation
> ----------------------------------------------------------
>
> Key: GEODE-6570
> URL: https://issues.apache.org/jira/browse/GEODE-6570
> Project: Geode
> Issue Type: Bug
> Components: membership
> Reporter: Bruce Schuchardt
> Assignee: Bruce Schuchardt
> Priority: Major
> Fix For: 1.9.0
>
>
> In a test that kills and restarts locators one of the restarting locators
> times out trying to join the distributed system. Logs show that another
> locator was becoming the membership coordinator and was delayed in sending
> out a membership view when it processed a different join request for a member
> that was already in the distributed system.
> locator A gets join request from node 1 and sends a PREPARE
> node 1 sets its identity's view ID using the PREPAREd view
> locator A is killed
> node 1 sends a join request to locator B. Its identity has a view ID set.
> node 2 sends a join request to locator B and gets a PREPARE
> locator B processes node 1's join request and assigns a new view ID to it
> locator B processes node 2's join request and assigns a new view ID to it
> locator B sends the PREPARE with these two new nodes. It also has node 1's
> original ID
> locator B times out waiting for a response from node 1 with the new view ID
> and declares it crashed. It sends out a new PREPARE w/o that address.
> node 2 gives up waiting
> locator B gets no response from node 2 and declares it crashed, sends out a
> new PREPARE without node 2 and succeeds.
> Here are log snippets showing the problem. Process 616 has a JoinRequest
> queued when this locator becomes coordinator. The JoinRequest ID has v46
> already in it, showing that a PREPARE has already been sent with this member
> in it.
> The locator then creates a new View that has process 616's ID in it twice -
> once with v46 and once with v60
> {noformat}
> locatorgemfire_2_2_29835/system.log: [fine 2019/03/27 22:22:22.817 PDT
> locatorgemfire_2_2_host2_29835 <Geode Membership View Creator> tid=0xba]
> processing request
> JoinRequestMessage(rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_1_host2_616:616)<ec><v46>:41004)
> failureDetectionPort:43747
> locatorgemfire_2_2_29835/system.log: [fine 2019/03/27 22:22:22.817 PDT
> locatorgemfire_2_2_host2_29835 <Geode Membership View Creator> tid=0xba]
> processing request
> JoinRequestMessage(rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_3_host2_746:746:locator)<ec>:41002)
> failureDetectionPort:52188
> locatorgemfire_2_2_29835/system.log: [info 2019/03/27 22:22:22.818 PDT
> locatorgemfire_2_2_host2_29835 <Geode Membership View Creator> tid=0xba]
> preparing new view
> View[rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_2_host2_29835:29835:locator)<ec><v24>:41001|60]
> members:
> [rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_2_host2_29835:29835:locator)<ec><v24>:41001,
>
> rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_2_host2_30052:30052)<ec><v25>:41007{lead},
>
> rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_4_host2_31300:31300:locator)<ec><v29>:41003,
>
> rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_1_host2_31671:31671:locator)<ec><v41>:41000,
>
> rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_2_host2_31856:31856)<ec><v42>:41006,
>
> rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_1_host2_32560:32560)<ec><v44>:41005,
>
> rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_1_host2_616:616)<ec><v46>:41004,
>
> rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_1_host2_616:616)<ec><v60>:41004,
>
> rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_3_host2_746:746:locator)<ec><v60>:41002]
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)