[
https://issues.apache.org/jira/browse/GEODE-7031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Hanson closed GEODE-7031.
------------------------------
Transition from Resolved to Closed for Apache Geode 1.11.0 RC4 release.
> Attempts to send messages to alert listeners delays network partition
> detection
> -------------------------------------------------------------------------------
>
> Key: GEODE-7031
> URL: https://issues.apache.org/jira/browse/GEODE-7031
> Project: Geode
> Issue Type: Improvement
> Components: membership
> Reporter: Bruce J Schuchardt
> Assignee: Bruce J Schuchardt
> Priority: Major
> Fix For: 1.11.0
>
> Time Spent: 2h 50m
> Remaining Estimate: 0h
>
> In a number of recent regression test runs in AWS we have seen network
> partition detection tests fail to detect the partition in a reasonable amount
> of time. Logs show membership services attempting to send alerts to other
> processes that are no longer reachable. Each attempt takes 6 * the
> member-timeout setting - that's 30 seconds for each attempt. It would be
> nice to have a different connection-formation timeout for something like this
> since alert notification is built into the logging system that membership
> services have to use. Since the alert system is also dependent on membership
> services functioning properly this creates a circular dependency that has
> historically caused hangs and delays such as the one described here.
> {noformat}
> [debug 2019/07/29 14:35:03.824 PDT <Geode Failure Detection thread 5>
> tid=0xc3] Sending (Alert "Unable to send message to
> 10.32.108.136(gemfire3_host2_12249:12249)<v3>:41003" level WARNING) to 1
> peers ([10.32.108.136(gemfire4_host2_12220:12220:locator)<ec><v1>:41001]) via
> tcp/ip
> [debug 2019/07/29 14:35:03.825 PDT <Geode Failure Detection thread 5>
> tid=0xc3] created PendingConnection
> org.apache.geode.internal.tcp.ConnectionTable$PendingConnection@4f4c8630
> created by Geode Failure Detection thread 5
> [info 2019/07/29 14:35:33.847 PDT <Geode Failure Detection thread 5>
> tid=0xc3] Connection: shared=true ordered=true failed to connect to peer
> 10.32.108.136(gemfire4_host2_12220:12220:locator)<ec><v1>:41001 because:
> java.net.SocketTimeoutException
> [debug 2019/07/29 14:35:33.852 PDT <Geode Failure Detection thread 5>
> tid=0xc3] Giving up connecting to alert listener
> 10.32.108.136(gemfire4_host2_12220:12220:locator)<ec><v1>:41001{noformat}
>
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)