Jie Huang created ZOOKEEPER-3243:
------------------------------------
Summary: Add server side request throttling
Key: ZOOKEEPER-3243
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3243
Project: ZooKeeper
Issue Type: Improvement
Components: server
Reporter: Jie Huang
Fix For: 3.6.0
On-going performance investigation at Facebook has demonstrated that Zookeeper
is easily overwhelmed by spikes in connection rates and/or write request rates.
Zookeeper performance gets progressively worse, clients timeout and try to
reconnect (exacerbating the problem) and things enter a death spiral. To solve
this problem, we need to add load protection to Zookeeper via rate limiting and
work shedding.
This JIRA task adds a new request throttling mechanism (RequestThrottler) to
Zookeeper in hopes of preventing Zookeeper from becoming overwhelmed during
request spikes.
When enabled, the RequestThrottler limits the number Of outstanding requests
currently submitted to the request processor pipeline.
The throttler augments the limit imposed by the globalOutstandingLimit that is
enforced by the connection layer (NIOServerCnxn, NettyServerCnxn). The
connection layer limit applies backpressure against the TCP connection by
disabling selection on connections once the request limit is reached. However,
the connection layer always allows a connection to send at least one request
before disabling selection on that connection. Thus, in a scenario with 40000
client connections, the total number of requests inflight may be as high as
40000 even if the globalOustandingLimit was set lower.
The RequestThrottler addresses this issue by adding additional queueing. When
enabled, client connections no longer submit requests directly to the request
processor pipeline but instead to the RequestThrottler. The RequestThrottler is
then responsible for issuing requests to the request processors, and enforces a
separate maxRequests limit. If the total number of outstanding requests is
higher than maxRequests, the throttler will continually stall for stallTime
milliseconds until under limit.
The RequestThrottler can also optionally drop stale requests rather than submit
them to the processor pipeline. A stale request is a request sent by a
connection that is already closed, and/or a request whose latency will end up
being higher than its associated session timeout.
To ensure ordering guarantees, if a request is ever dropped from a connection
that connection is closed and flagged as invalid. All subsequent requests
inflight from that connection are then dropped as well.
The notion of staleness is configurable, both connection staleness and latency
staleness can be individually enabled/disabled. Both these settings and the
various throttle settings (limit, stall time, stale drop) can be configured via
system properties as well as at runtime via JMX.
The throttler has been tested and benchmarked at Facebook
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)