Bruce Schuchardt created GEODE-7031:
---------------------------------------
Summary: Attempts to send messages to alert listeners delays
network partition detection
Key: GEODE-7031
URL: https://issues.apache.org/jira/browse/GEODE-7031
Project: Geode
Issue Type: Improvement
Components: membership
Reporter: Bruce Schuchardt
In a number of recent regression test runs in AWS we have seen network
partition detection tests fail to detect the partition in a reasonable amount
of time. Logs show membership services attempting to send alerts to other
processes that are no longer reachable. Each attempt takes 6 * the
member-timeout setting - that's 30 seconds for each attempt. It would be nice
to have a different connection-formation timeout for something like this since
alert notification is built into the logging system that membership services
have to use. Since the alert system is also dependent on membership services
functioning properly this creates a circular dependency that has historically
caused hangs and delays such as the one described here.
{noformat}
[debug 2019/07/29 14:35:03.824 PDT <Geode Failure Detection thread 5> tid=0xc3]
Sending (Alert "Unable to send message to
10.32.108.136(gemfire3_host2_12249:12249)<v3>:41003" level WARNING) to 1 peers
([10.32.108.136(gemfire4_host2_12220:12220:locator)<ec><v1>:41001]) via tcp/ip
[debug 2019/07/29 14:35:03.825 PDT <Geode Failure Detection thread 5> tid=0xc3]
created PendingConnection
org.apache.geode.internal.tcp.ConnectionTable$PendingConnection@4f4c8630
created by Geode Failure Detection thread 5
[info 2019/07/29 14:35:33.847 PDT <Geode Failure Detection thread 5> tid=0xc3]
Connection: shared=true ordered=true failed to connect to peer
10.32.108.136(gemfire4_host2_12220:12220:locator)<ec><v1>:41001 because:
java.net.SocketTimeoutException
[debug 2019/07/29 14:35:33.852 PDT <Geode Failure Detection thread 5> tid=0xc3]
Giving up connecting to alert listener
10.32.108.136(gemfire4_host2_12220:12220:locator)<ec><v1>:41001{noformat}
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)