[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531799#comment-15531799
 ] 

ASF GitHub Bot commented on ZOOKEEPER-1748:
-------------------------------------------

GitHub user gnethercutt opened a pull request:

    https://github.com/apache/zookeeper/pull/83

    enable TCP keepalive for the leadership election/quorum socket

    Use TCP keep-alives for election/quorum peer connections.
    
    This is the shortest edit distance to address 
[ZOOKEEPER-1748](https://issues.apache.org/jira/browse/ZOOKEEPER-1748), and is 
required to avoid silent packet delivery failures for a long-lived connection 
in AWS (amongst other environments). 
    
    See also:
    - [VPC security group connection 
tracking](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html#security-group-connection-tracking)
    - [Using TCP 
keepalives](http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html)
    - [Zookeeper 
internals](https://zookeeper.apache.org/doc/r3.4.8/zookeeperInternals.html)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gnethercutt/zookeeper election_tcp_keepalive

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zookeeper/pull/83.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #83
    
----
commit bcab41003d91dc121e368337317710c2434bece8
Author: Glenn Nethercutt <[email protected]>
Date:   2016-09-28T18:22:41Z

    enable TCP keepalive for the leadership election/quorum socket

----


> TCP keepalive for leader election connections
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1748
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1748
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: leaderElection
>    Affects Versions: 3.4.5, 3.5.0
>         Environment: Linux, Java 1.7
>            Reporter: Antal Sasvári
>            Assignee: Daniel Peon
>            Priority: Minor
>             Fix For: 3.5.3, 3.6.0
>
>         Attachments: Zookeeper-1748-add_tcp_keepalive.patch
>
>
> In our system we encountered the following problem:
> If the system is stable, and there is no leader election, the leader election 
> port connections are open for very long time without any packets being sent 
> on them.
> Some network elements silently drop the established TCP connection after a 
> timeout if there are no packets being sent on it. In this case the ZK servers 
> will not notice the connection loss. This causes additional delay later when 
> the next leader election is started, as the TCP connections are not alive any 
> more.
> We would like to be able to enable TCP keepalive on the leader election 
> sockets in order to prevent the connection timeout in some network elements 
> due to connection inactivity.
> This could be controlled by adding a new config parameter called tcpKeepAlive 
> in the ZooKeeper configuration file. It would be only applicable in case of 
> algorithm 3 (TCP based fast leader election), having the default value false.
> If tcpKeepAlive is set to true, the TCP keepalive flag should be enabled for 
> the leader election sockets in QuorumCnxManager.setSockOpts() by calling 
> sock.setKeepAlive(true).
> We have tested this change successfully in our environment.
> Please comment whether you see any problem with this. If not, I am going to 
> submit a patch.
> I've been told that e.g. Apache ActiveMQ also has a config option for similar 
> purpose called transport.keepalive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to