[jira] [Commented] (ZOOKEEPER-1049) Session expire/close flooding renders heartbeats to delay significantly

Chang Song (JIRA) Tue, 19 Apr 2011 00:49:49 -0700

    [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021452#comment-13021452
 ]


Chang Song commented on ZOOKEEPER-1049:
---------------------------------------

I think we found a culprit

{code}
public NIOServerCnxn(ZooKeeperServer zk, SocketChannel sock,
            SelectionKey sk, Factory factory) throws IOException {
        this.zk = zk;
        this.sock = sock;
        this.sk = sk;
        this.factory = factory;
        sock.socket().setTcpNoDelay(true);
        sock.socket().setSoLinger(true, 2); // socket linger option of 
        InetAddress addr = ((InetSocketAddress) sock.socket()
                .getRemoteSocketAddress()).getAddress();
        authInfo.add(new Id("ip", addr.getHostAddress()));
        sk.interestOps(SelectionKey.OP_READ);
    }
{code}

it is the socket linger option.
since clients session already expired, do we need to wait 2 second to wait 
until clients to consume socket receive buffer?

we have set to linger option to 0 to send TCP RESET, and the problem went away.

We'll try to test original case, and report back.




> Session expire/close flooding renders heartbeats to delay significantly
> -----------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1049
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1049
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2
>         Environment: CentOS 5.3, three node ZK ensemble
>            Reporter: Chang Song
>            Priority: Critical
>         Attachments: ZookeeperPingTest.zip, zk_ping_latency.pdf
>
>
> Let's say we have 100 clients (group A) already connected to three-node ZK 
> ensemble with session timeout of 15 second.  And we have 1000 clients (group 
> B) already connected to the same ZK ensemble, all watching several nodes 
> (with 15 second session timeout)
> Consider a case in which All clients in group B suddenly hung or deadlocked 
> (JVM OOME) all at the same time. 15 seconds later, all sessions in group B 
> gets expired, creating session closing stampede. Depending on the number of 
> this clients in group B, all request/response ZK ensemble should process get 
> delayed up to 8 seconds (1000 clients we have tested).
> This delay causes some clients in group A their sessions expired due to delay 
> in getting heartbeat response. This causes normal servers to drop out of 
> clusters. This is a serious problem in our installation, since some of our 
> services running batch servers or CI servers creating the same scenario as 
> above almost everyday.
> I am attaching a graph showing ping response time delay.
> I think ordering of creating/closing sessions and ping exchange isn't 
> important (quorum state machine). at least ping request / response should be 
> handle independently (different queue and different thread) to keep 
> realtime-ness of ping.
> As a workaround, we are raising session timeout to 50 seconds.
> But this causes max. failover of cluster to significantly increased, thus 
> initial QoS we promised cannot be met.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1049) Session expire/close flooding renders heartbeats to delay significantly

Reply via email to