Jie Huang created ZOOKEEPER-3242:
------------------------------------
Summary: Add server side connecting throttling
Key: ZOOKEEPER-3242
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3242
Project: ZooKeeper
Issue Type: Improvement
Components: server
Reporter: Jie Huang
Fix For: 3.6.0
On-going performance investigation at Facebook has demonstrated that Zookeeper
is easily overwhelmed by spikes in connection rates and/or write request rates.
Zookeeper performance gets progressively worse, clients timeout and try to
reconnect (exacerbating the problem) and things enter a death spiral. To solve
this problem, we need to add load protection to Zookeeper via rate limiting and
work shedding.
This Jira adds a new connection rate limiting mechanism to Zookeeper in hopes
of preventing Zookeeper from becoming overwhelmed during connection spikes.
The new throttle is focused on limiting connections per second. The throttle is
implemented as a token-bucket with optional probabilistic dropping based on the
BLUE queue management algorithm.
This token-bucket design allows the throttle to allow short bursts to pass,
while still capping the total number of requests per second. However, an issue
with a token bucket approach is that the wall clock arrival time of requests
affects the probability of a request being allowed to pass or not. Under
constant load this can lead to request starvation for requests that constantly
arrive later than the majority. The optional probabilistic dropping mechanism
is designed to combat this, making rejections a random event with little skew
based on arrival time.
A more verbose description can be found in the comments in
org.apache.zookeeper.server.BlueThrottle.
By default, both the token-bucket and probabilistic dropping mechanism are
disabled. Enabling and tuning the throttles can be done both via Java system
properties as well as against a running node via JMX.
The throttle has been tested and benchmarked at Facebook.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)