GitHub user tonysun83 opened a pull request:
https://github.com/apache/couchdb-couch-replicator/pull/38
Add New Rate Limiter For Replication
This PR is more of a design/feature discussion. I am currently manually
testing and writing tests. However, I'd like opinions about it's usefulness and
overall feedback on the algorithm/design.
Purpose: To rate limit replications against clusters that have rate
limitations (error 429) and also prevent DOS attacks by posting too many
replication docs.
Usage:
For a replication request we add in 4 new replication options:
src_rate_limit
src_rate_period
target_rate_limit
target_rate_period
These define the maximum number of requests per interval,
i.e limit = 10, period = 5 would be a rate of 2 requests per second.
When the number of send_reqs exceed the rate, we force that worker to sleep
proportionally to the overflow (more info on that below). This allows
replication to taper off when there are too many incoming requests.
Design (Modified Token Bucket Algorithm):
1 ETS table handled by a gen_server started by our
couch_replicator_supervisor. The ETS table's key is HOST, which is the hostname
of the remote url, stripped of the username and password. Each entry contains:
requestCounter - number of requests for one PERIOD interval.
lastUpdateTimestamp - the last time the background process reset the
requestCounter to 0.
pid - the background process that is wakes up every PERIOD interval to
reset the requestCounter to 0
replications = number of total Replication Jobs for this host,
limit - max send_requests for this host
period - interval for send_requests.
1) When a replication starts, we check if limit or period is defined for
the replication job. If not, we initialize things normally. If both are
defined, then we either create or update an entry in the HOSTS table with the
new limit and period. We make this step synchronous with a gen_server:call to
avoid concurrency issues with our read+write. However, a person could still
change the limit and period for a particular host in the middle of another
replication. So Rep A would be at 10 req/s, and then Rep B could come in while
Rep A is still running and change it to 5 req/s. I still need to think this
through if we should allow this. Every entry should have an associated process
that is constantly resetting the requestCounter to 0 every PERIOD.
2) Before each send_req, we check to see if there is a limit and period
defined. If not, sleep time is 0. If so, we update the requestCounter for that
Host. Then based on the requestCounter overflow we tell the worker process to
sleep proportionally + a delay time. So for example, if your limit is 5 reqs
per second , and the 6th request came in at 800ms, then the process would sleep
200 ms. If the 7th request came in at 900 ms, it would sleep 100ms + 400 ms for
and wake up at 1.4 seconds. We are trying to simulate a consistent rate.
Currently, I have no limit on the sleep time, so if a ton of requests came in,
there could be processes that sleep for a very long time. Most likely, I'll
just have a max sleep time when reached we cancel the replication. These steps
are asynchronous because ets:update_counter is atomic.
3) At the end of a db_close, we synchronously decrement the replications
counter for a particular host. If the counter reaches 0, we remove the entry
from the table because there are no more replications for that host.
Let me know if there are any concurrency flaws in logic above.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/couchdb-couch-replicator
3010-add-rate-limiter
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/couchdb-couch-replicator/pull/38.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #38
----
commit b7b2ac3663fc653ece65ab601d48333d07aac878
Author: Tony Sun <[email protected]>
Date: 2016-05-05T04:25:30Z
Add rate limiter for replication
For a remote replication request, we add in support for rate limiting.
We add in 4 new replication options:
src_rate_limit
src_rate_period
target_rate_limit
target_rate_period
These define the maximum number of requests per interval,
i.e limit = 10, period = 5 would be a rate of 2 requests per second.
COUCHDB-3010
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---