[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han reassigned ZOOKEEPER-3243:
--------------------------------------

    Assignee: Jie Huang

> Add server side request throttling
> ----------------------------------
>
>                 Key: ZOOKEEPER-3243
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3243
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Jie Huang
>            Assignee: Jie Huang
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 3.6.0
>
>          Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> On-going performance investigation at Facebook has demonstrated that 
> Zookeeper is easily overwhelmed by spikes in connection rates and/or write 
> request rates. Zookeeper performance gets progressively worse, clients 
> timeout and try to reconnect (exacerbating the problem) and things enter a 
> death spiral. To solve this problem, we need to add load protection to 
> Zookeeper via rate limiting and work shedding.
> This JIRA task adds a new request throttling mechanism (RequestThrottler) to 
> Zookeeper in hopes of preventing Zookeeper from becoming overwhelmed during 
> request spikes.
>  
> When enabled, the RequestThrottler limits the number Of outstanding requests 
> currently submitted to the request processor pipeline. 
>  
> The throttler augments the limit imposed by the globalOutstandingLimit that 
> is enforced by the connection layer (NIOServerCnxn, NettyServerCnxn). The 
> connection layer limit applies backpressure against the TCP connection by 
> disabling selection on connections once the request limit is reached. 
> However, the connection layer always allows a connection to send at least one 
> request before disabling selection on that connection. Thus, in a scenario 
> with 40000 client connections, the total number of requests inflight may be 
> as high as 40000 even if the globalOustandingLimit was set lower.
>  
> The RequestThrottler addresses this issue by adding additional queueing. When 
> enabled, client connections no longer submit requests directly to the request 
> processor pipeline but instead to the RequestThrottler. The RequestThrottler 
> is then responsible for issuing requests to the request processors, and 
> enforces a separate maxRequests limit. If the total number of outstanding 
> requests is higher than maxRequests, the throttler will continually stall for 
> stallTime milliseconds until under limit.
>  
> The RequestThrottler can also optionally drop stale requests rather than 
> submit them to the processor pipeline. A stale request is a request sent by a 
> connection that is already closed, and/or a request whose latency will end up 
> being higher than its associated session timeout.
> To ensure ordering guarantees, if a request is ever dropped from a connection 
> that connection is closed and flagged as invalid. All subsequent requests 
> inflight from that connection are then dropped as well.
>  
> The notion of staleness is configurable, both connection staleness and 
> latency staleness can be individually enabled/disabled. Both these settings 
> and the various throttle settings (limit, stall time, stale drop) can be 
> configured via system properties as well as at runtime via JMX.
>  
> The throttler has been tested and benchmarked at Facebook



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to