[ 
https://issues.apache.org/jira/browse/COUCHDB-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977523#comment-15977523
 ] 

ASF subversion and git services commented on COUCHDB-3324:
----------------------------------------------------------

Commit 0e0b42c0f18944bc2073f93deab6dc857db6da30 in couchdb's branch 
refs/heads/63012-scheduler from [~vatamane]
[ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=0e0b42c ]

AIMD based rate limiter implementation

AIMD: additive increase / multiplicative decrease feedback control algorithm.

https://en.wikipedia.org/wiki/Additive_increase/multiplicative_decrease

This is an algorithm which converges on the available channel capacity.
Each participant doesn't a priori know the capacity and participants don't
communicate or know about each other (so they don't coordinate to divide
the capacity among themselves).

A variation of this is used in TCP congestion control algorithm. This is proven
to converge, while for example, additive increase / additive decrease  or
multiplicative increase / multiplicative decrease won't.

A few tweaks were applied to the base control logic:

 * Estimated value is an interval (period) instead of a rate. This is for
   convenience, as users will probably want to know how much to sleep. But,
   rate is just 1000 / interval, so it is easy to transform.

 * There is a hard max limit for estimated period.  Mainly as a practical 
concern
   as connections sleeping too long will timeout and / or jobs will waste time
   sleeping and consume scheduler slots, while others could be running.

 * There is a time decay component used to handle large pauses between updates.
   In case of large update interval, assume (optimistically) some successful
   requests have been made. Intuitively, the more time passes, the less accurate
   the estimated period probably is.

 * The rate of updates applied to the algorithm is limited. This effectively
   acts as a low pass filter and make the algorithm handle better spikes and
   short bursts of failures. This is not a novel idea, some alternative TCP
   control algorithms like Westwood+ do something similar.

 * There is a large downward pressure applied to the increasing interval as it
   approaches the max limit. This is done by tweaking the additive factor via
   a step function. In practice this has effect of trying to make it a bit
   harder for jobs to cross the maximum backoff threshold, as they would be
   killed and potentially lose intermediate work.

Main API functions are:

   success(Key) -> IntervalInMilliseconds

   failure(Key) -> IntervalInMilliseconds

   interval(Key) -> IntervalInMilliseconds

Key is any (hashable by phash2) term. Typically would be something like
{Method, Url}. The result from the function is the current period value. Caller
would then presumably choose to sleep for that amount of time before or after
making requests. The current interval can be read with interval(Key) function.

Implementation is sharded ETS tables based on the key and there is a periodic
timer which cleans unused items.

Jira: COUCHDB-3324


> Scheduling Replicator
> ---------------------
>
>                 Key: COUCHDB-3324
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-3324
>             Project: CouchDB
>          Issue Type: New Feature
>            Reporter: Nick Vatamaniuc
>
> Improve CouchDB replicator
>  * Allow running a large number of replication jobs
>  * Improve API with a focus on ease of use and performance. Avoid updating 
> replication document with transient state updates. Instead create a proper 
> API for querying replication states. At the same time provide a compatibility 
> mode to let users keep existing behavior (of getting updates in documents).
>  * Improve network resource usage and performance. Multiple connection to the 
> same cluster could share socket connections
>  * Handle rate limiting on target and source HTTP endpoints. Let replication 
> request auto-discover rate limit capacity based on a proven algorithm such as 
> Additive Increase / Multiplicative Decrease feedback control loop.
>  * Improve performance by avoiding repeatedly retrying failing replication 
> jobs. Instead use exponential backoff. 
>  * Improve recovery from long (but temporary) network failure. Currently if 
> replications jobs fail to start 10 times in a row they will not be retried 
> anymore. This is not always desirable. In case of a long enough DNS (or other 
> network) failure replication jobs will effectively stop until they are 
> manually restarted.
>  * Better handling of filtered replications: Failing to fetch filters could 
> block couch replicator manager, lead to message queue backups and memory 
> exhaustion. Also, when replication filter code changes update replication 
> accordingly (replication job ID should change in that case)
>  * Provide better metrics to introspect replicator behavior.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to