Bruce Schuchardt created GEODE-6423:
---------------------------------------

             Summary: availability checks sometimes immediately initiate removal
                 Key: GEODE-6423
                 URL: https://issues.apache.org/jira/browse/GEODE-6423
             Project: Geode
          Issue Type: Bug
          Components: membership
            Reporter: Bruce Schuchardt


If the network goes down the JGroupsMessenger service initiates suspect 
processing when it tries to send messages.  In 1.8 this seems to initiate 
immediate removal of the suspect.

ioexception sending udp message initiates suspicion

suspect processing initiates a final check

the final check fails immediately (it's using a timed Socket.connect() which 
fails immediately)

the member is declared dead
{noformat}
[info 2019/02/13 17:44:59.366 CST perf157-130-167-server1 <Geode Failure 
Detection thread 3> tid=0xc2] received suspect message from myself for 
192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000: Unable 
to send messages to this member via JGroups

[info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure 
Detection thread 4> tid=0xc3] Performing final check for suspect member 
192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000 
reason=Unable to send messages to this member via JGroups

[info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure 
Detection thread 5> tid=0xc4] Performing final check for suspect member 
192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202 reason=Unable to send 
messages to this member via JGroups

[info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure 
Detection thread 4> tid=0xc3] Failure detection is now watching 
192.168.130.167(perf157-130-167-server1:225263)<v1>:16200

[info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure 
Detection thread 5> tid=0xc4] Failure detection is now watching 
192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000

[info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure 
Detection thread 3> tid=0xc2] received suspect message from myself for 
192.168.130.167(perf157-130-167-server2:225522)<v2>:16201: Unable to send 
messages to this member via JGroups

[info 2019/02/13 17:44:59.369 CST perf157-130-167-server1 <Geode Failure 
Detection thread 6> tid=0xc5] Performing final check for suspect member 
192.168.130.167(perf157-130-167-server2:225522)<v2>:16201 reason=Unable to send 
messages to this member via JGroups

[info 2019/02/13 17:44:59.369 CST perf157-130-167-server1 <Geode Failure 
Detection thread 6> tid=0xc5] Failure detection is now watching 
192.168.130.167(perf157-130-167-server1:225263)<v1>:16200

[info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure 
Detection thread 5> tid=0xc4] Final check failed for member 
192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202

[info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure 
Detection thread 5> tid=0xc4] Requesting removal of suspect member 
192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202

[info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure 
Detection thread 4> tid=0xc3] Final check failed for member 
192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000

[info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure 
Detection thread 4> tid=0xc3] Requesting removal of suspect member 
192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000

[info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure 
Detection thread 4> tid=0xc3] This member is becoming the membership 
coordinator with address 
192.168.130.167(perf157-130-167-server1:225263)<v1>:16200

[info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure 
Detection thread 6> tid=0xc5] Final check failed for member 
192.168.130.167(perf157-130-167-server2:225522)<v2>:16201

[info 2019/02/13 17:44:59.373 CST perf157-130-167-server1 <Geode Failure 
Detection thread 6> tid=0xc5] Requesting removal of suspect member 
192.168.130.167(perf157-130-167-server2:225522)<v2>:16201

[info 2019/02/13 17:44:59.376 CST perf157-130-167-server1 <Geode Failure 
Detection thread 4> tid=0xc3] ViewCreator starting 
on:192.168.130.167(perf157-130-167-server1:225263)<v1>:16200

[info 2019/02/13 17:44:59.376 CST perf157-130-167-server1 <Geode Membership 
View Creator> tid=0xc6] View Creator thread is starting

[info 2019/02/13 17:44:59.377 CST perf157-130-167-server1 <Geode Membership 
View Creator> tid=0xc6] 
192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000 had a 
weight of 3

[info 2019/02/13 17:44:59.377 CST perf157-130-167-server1 <Geode Membership 
View Creator> tid=0xc6] 
192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202 had a weight of 10

[info 2019/02/13 17:44:59.377 CST perf157-130-167-server1 <Geode Membership 
View Creator> tid=0xc6] preparing new view 
View[192.168.130.167(perf157-130-167-server1:225263)<v1>:16200|10] members: 
[192.168.130.167(perf157-130-167-server1:225263)<v1>:16200{lead}, 
192.168.130.167(perf157-130-167-server2:225522)<v2>:16201] crashed: 
[192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000, 
192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202]

[info 2019/02/13 17:45:03.627 CST perf157-130-167-server1 <unicast 
receiver,perf157-130-167-62066> tid=0x21] received suspect message from 
192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202 for 
192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000: Unable 
to send messages to this member via JGroups

[info 2019/02/13 17:45:03.718 CST perf157-130-167-server1 <unicast 
receiver,perf157-130-167-62066> tid=0x21] Membership received a request to 
remove 192.168.130.167(perf157-130-167-server1:225263)<v1>:16200 from 
192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000 
reason=Unable to send messages to this member via JGroups

[severe 2019/02/13 17:45:03.719 CST perf157-130-167-server1 <unicast 
receiver,perf157-130-167-62066> tid=0x21] Membership service failure: Unable to 
send messages to this member via JGroups
org.apache.geode.ForcedDisconnectException: Unable to send messages to this 
member via JGroups
{noformat}
 

We expect the final check to respect the member-timeout setting.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to