Bruce J Schuchardt created GEODE-8721:
-----------------------------------------
Summary: member that should become coordinator never detects loss
of current coordinator
Key: GEODE-8721
URL: https://issues.apache.org/jira/browse/GEODE-8721
Project: Geode
Issue Type: Bug
Components: membership
Reporter: Bruce J Schuchardt
During a network partition a server that should have become membership
coordinator and shut down its side of the partition never detected the loss of
a server on the other side of the partition. Instead it continually performed
availability checks on that other server and the checks passed. Its log file
had continually increasing timestamps for when it claimed the other server had
contacted it, which was not possible due to the network partition (which was
formed through iptable manipulation).
At least one other server on its side of the network partition was doing the
same thing. It looks like they were interfering with each others availability
checks in some way.
{noformat}
locatorp1_26023/system.log: [info 2020/10/20 22:23:16.227 PDT <Geode UDP
Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Availability check detected
recent message traffic for suspect member
10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue
Oct 20 22:23:12 PDT 2020
locatorp1_26023/system.log: [info 2020/10/20 22:23:16.228 PDT <Geode UDP
Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Availability check passed for
suspect member 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000
bridgep1_25995/system.log: [info 2020/10/20 22:23:16.229 PDT <unicast
receiver,rs-F21040449a0i3large-72-61636> tid=0x23] No longer suspecting
10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000
bridgep1_25998/system.log: [info 2020/10/20 22:23:17.212 PDT <Geode UDP
Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Availability check detected
recent message traffic for suspect member
10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue
Oct 20 22:23:14 PDT 2020
bridgep1_25998/system.log: [info 2020/10/20 22:23:17.213 PDT <Geode UDP
Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Availability check passed for
suspect member 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000
locatorp1_26023/system.log: [info 2020/10/20 22:23:17.232 PDT <Geode UDP
Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Performing availability check
for suspect member
10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 reason=Unable
to send messages to this member via JGroups
bridgep1_25998/system.log: [info 2020/10/20 22:23:18.215 PDT <Geode UDP
Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Performing availability check
for suspect member
10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 reason=Unable
to send messages to this member via JGroups
bridgep1_25995/system.log: [info 2020/10/20 22:23:21.006 PDT <Geode UDP
Timer-2,rs-F21040449a0i3large-72-61636> tid=0x21] Availability check detected
recent message traffic for suspect member
10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue
Oct 20 22:23:16 PDT 2020
{noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)