[
https://issues.apache.org/jira/browse/ZOOKEEPER-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021447#comment-13021447
]
Chang Song commented on ZOOKEEPER-1049:
---------------------------------------
I have uploaded a python-based reproducer, and test results.
This probably is the worst iowait CPU% (you can find them in Result directory.
avg-cpu: %user %nice %system %iowait %steal %idle
6.15 0.00 2.05 2.56 0.00 89.23
The resulting latency delay (with this reproducer)- You will find 2 second
delay for some clients
close session elapsed : 0ms
close session elapsed : 0ms
close session elapsed : 2001ms
close session elapsed : 1ms
close session elapsed : 0ms
.....
close session elapsed : 0ms
close session elapsed : 0ms
close session elapsed : 2001ms
close session elapsed : 0ms
close session elapsed : 0ms
....
close session elapsed : 0ms
close session elapsed : 2000ms
close session elapsed : 2000ms
close session elapsed : 0ms
close session elapsed : 0ms
> Session expire/close flooding renders heartbeats to delay significantly
> -----------------------------------------------------------------------
>
> Key: ZOOKEEPER-1049
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1049
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.3.2
> Environment: CentOS 5.3, three node ZK ensemble
> Reporter: Chang Song
> Priority: Critical
> Attachments: ZookeeperPingTest.zip, zk_ping_latency.pdf
>
>
> Let's say we have 100 clients (group A) already connected to three-node ZK
> ensemble with session timeout of 15 second. And we have 1000 clients (group
> B) already connected to the same ZK ensemble, all watching several nodes
> (with 15 second session timeout)
> Consider a case in which All clients in group B suddenly hung or deadlocked
> (JVM OOME) all at the same time. 15 seconds later, all sessions in group B
> gets expired, creating session closing stampede. Depending on the number of
> this clients in group B, all request/response ZK ensemble should process get
> delayed up to 8 seconds (1000 clients we have tested).
> This delay causes some clients in group A their sessions expired due to delay
> in getting heartbeat response. This causes normal servers to drop out of
> clusters. This is a serious problem in our installation, since some of our
> services running batch servers or CI servers creating the same scenario as
> above almost everyday.
> I am attaching a graph showing ping response time delay.
> I think ordering of creating/closing sessions and ping exchange isn't
> important (quorum state machine). at least ping request / response should be
> handle independently (different queue and different thread) to keep
> realtime-ness of ping.
> As a workaround, we are raising session timeout to 50 seconds.
> But this causes max. failover of cluster to significantly increased, thus
> initial QoS we promised cannot be met.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira