Bruce J Schuchardt created GEODE-8721:
-----------------------------------------

             Summary: member that should become coordinator never detects loss 
of current coordinator
                 Key: GEODE-8721
                 URL: https://issues.apache.org/jira/browse/GEODE-8721
             Project: Geode
          Issue Type: Bug
          Components: membership
            Reporter: Bruce J Schuchardt


During a network partition a server that should have become membership 
coordinator and shut down its side of the partition never detected the loss of 
a server on the other side of the partition.  Instead it continually performed 
availability checks on that other server and the checks passed.  Its log file 
had continually increasing timestamps for when it claimed the other server had 
contacted it, which was not possible due to the network partition (which was 
formed through iptable manipulation).

At least one other server on its side of the network partition was doing the 
same thing.  It looks like they were interfering with each others availability 
checks in some way.

{noformat}
locatorp1_26023/system.log: [info 2020/10/20 22:23:16.227 PDT <Geode UDP 
Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Availability check detected 
recent message traffic for suspect member 
10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue 
Oct 20 22:23:12 PDT 2020

locatorp1_26023/system.log: [info 2020/10/20 22:23:16.228 PDT <Geode UDP 
Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Availability check passed for 
suspect member 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000


bridgep1_25995/system.log: [info 2020/10/20 22:23:16.229 PDT <unicast 
receiver,rs-F21040449a0i3large-72-61636> tid=0x23] No longer suspecting 
10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000


bridgep1_25998/system.log: [info 2020/10/20 22:23:17.212 PDT <Geode UDP 
Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Availability check detected 
recent message traffic for suspect member 
10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue 
Oct 20 22:23:14 PDT 2020

bridgep1_25998/system.log: [info 2020/10/20 22:23:17.213 PDT <Geode UDP 
Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Availability check passed for 
suspect member 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000


locatorp1_26023/system.log: [info 2020/10/20 22:23:17.232 PDT <Geode UDP 
Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Performing availability check 
for suspect member 
10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 reason=Unable 
to send messages to this member via JGroups


bridgep1_25998/system.log: [info 2020/10/20 22:23:18.215 PDT <Geode UDP 
Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Performing availability check 
for suspect member 
10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 reason=Unable 
to send messages to this member via JGroups


bridgep1_25995/system.log: [info 2020/10/20 22:23:21.006 PDT <Geode UDP 
Timer-2,rs-F21040449a0i3large-72-61636> tid=0x21] Availability check detected 
recent message traffic for suspect member 
10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue 
Oct 20 22:23:16 PDT 2020
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to