Bruce Schuchardt created GEODE-6423:
---------------------------------------
Summary: availability checks sometimes immediately initiate removal
Key: GEODE-6423
URL: https://issues.apache.org/jira/browse/GEODE-6423
Project: Geode
Issue Type: Bug
Components: membership
Reporter: Bruce Schuchardt
If the network goes down the JGroupsMessenger service initiates suspect
processing when it tries to send messages. In 1.8 this seems to initiate
immediate removal of the suspect.
ioexception sending udp message initiates suspicion
suspect processing initiates a final check
the final check fails immediately (it's using a timed Socket.connect() which
fails immediately)
the member is declared dead
{noformat}
[info 2019/02/13 17:44:59.366 CST perf157-130-167-server1 <Geode Failure
Detection thread 3> tid=0xc2] received suspect message from myself for
192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000: Unable
to send messages to this member via JGroups
[info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure
Detection thread 4> tid=0xc3] Performing final check for suspect member
192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000
reason=Unable to send messages to this member via JGroups
[info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure
Detection thread 5> tid=0xc4] Performing final check for suspect member
192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202 reason=Unable to send
messages to this member via JGroups
[info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure
Detection thread 4> tid=0xc3] Failure detection is now watching
192.168.130.167(perf157-130-167-server1:225263)<v1>:16200
[info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure
Detection thread 5> tid=0xc4] Failure detection is now watching
192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000
[info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure
Detection thread 3> tid=0xc2] received suspect message from myself for
192.168.130.167(perf157-130-167-server2:225522)<v2>:16201: Unable to send
messages to this member via JGroups
[info 2019/02/13 17:44:59.369 CST perf157-130-167-server1 <Geode Failure
Detection thread 6> tid=0xc5] Performing final check for suspect member
192.168.130.167(perf157-130-167-server2:225522)<v2>:16201 reason=Unable to send
messages to this member via JGroups
[info 2019/02/13 17:44:59.369 CST perf157-130-167-server1 <Geode Failure
Detection thread 6> tid=0xc5] Failure detection is now watching
192.168.130.167(perf157-130-167-server1:225263)<v1>:16200
[info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure
Detection thread 5> tid=0xc4] Final check failed for member
192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202
[info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure
Detection thread 5> tid=0xc4] Requesting removal of suspect member
192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202
[info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure
Detection thread 4> tid=0xc3] Final check failed for member
192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000
[info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure
Detection thread 4> tid=0xc3] Requesting removal of suspect member
192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000
[info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure
Detection thread 4> tid=0xc3] This member is becoming the membership
coordinator with address
192.168.130.167(perf157-130-167-server1:225263)<v1>:16200
[info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure
Detection thread 6> tid=0xc5] Final check failed for member
192.168.130.167(perf157-130-167-server2:225522)<v2>:16201
[info 2019/02/13 17:44:59.373 CST perf157-130-167-server1 <Geode Failure
Detection thread 6> tid=0xc5] Requesting removal of suspect member
192.168.130.167(perf157-130-167-server2:225522)<v2>:16201
[info 2019/02/13 17:44:59.376 CST perf157-130-167-server1 <Geode Failure
Detection thread 4> tid=0xc3] ViewCreator starting
on:192.168.130.167(perf157-130-167-server1:225263)<v1>:16200
[info 2019/02/13 17:44:59.376 CST perf157-130-167-server1 <Geode Membership
View Creator> tid=0xc6] View Creator thread is starting
[info 2019/02/13 17:44:59.377 CST perf157-130-167-server1 <Geode Membership
View Creator> tid=0xc6]
192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000 had a
weight of 3
[info 2019/02/13 17:44:59.377 CST perf157-130-167-server1 <Geode Membership
View Creator> tid=0xc6]
192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202 had a weight of 10
[info 2019/02/13 17:44:59.377 CST perf157-130-167-server1 <Geode Membership
View Creator> tid=0xc6] preparing new view
View[192.168.130.167(perf157-130-167-server1:225263)<v1>:16200|10] members:
[192.168.130.167(perf157-130-167-server1:225263)<v1>:16200{lead},
192.168.130.167(perf157-130-167-server2:225522)<v2>:16201] crashed:
[192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000,
192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202]
[info 2019/02/13 17:45:03.627 CST perf157-130-167-server1 <unicast
receiver,perf157-130-167-62066> tid=0x21] received suspect message from
192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202 for
192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000: Unable
to send messages to this member via JGroups
[info 2019/02/13 17:45:03.718 CST perf157-130-167-server1 <unicast
receiver,perf157-130-167-62066> tid=0x21] Membership received a request to
remove 192.168.130.167(perf157-130-167-server1:225263)<v1>:16200 from
192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000
reason=Unable to send messages to this member via JGroups
[severe 2019/02/13 17:45:03.719 CST perf157-130-167-server1 <unicast
receiver,perf157-130-167-62066> tid=0x21] Membership service failure: Unable to
send messages to this member via JGroups
org.apache.geode.ForcedDisconnectException: Unable to send messages to this
member via JGroups
{noformat}
We expect the final check to respect the member-timeout setting.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)