[ https://issues.apache.org/jira/browse/COUCHDB-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977520#comment-15977520 ]
ASF subversion and git services commented on COUCHDB-3324: ---------------------------------------------------------- Commit 4c48f69b1abddf8081ef1e04a05cb1ef1add51fb in couchdb's branch refs/heads/63012-scheduler from [~vatamane] [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4c48f69 ] Cluster ownership module implementation This module maintains cluster membership information for replication and provides functions to check ownership of replication jobs. A cluster membership change is registered only after a configurable `cluster_quiet_period` interval has passed since the last node addition or removal. This is useful in cases of rolling node reboots in a cluster in order to avoid rescanning for membership changes after every node up and down event, and instead doing only on rescan at the very end. Jira: COUCHDB-3324 > Scheduling Replicator > --------------------- > > Key: COUCHDB-3324 > URL: https://issues.apache.org/jira/browse/COUCHDB-3324 > Project: CouchDB > Issue Type: New Feature > Reporter: Nick Vatamaniuc > > Improve CouchDB replicator > * Allow running a large number of replication jobs > * Improve API with a focus on ease of use and performance. Avoid updating > replication document with transient state updates. Instead create a proper > API for querying replication states. At the same time provide a compatibility > mode to let users keep existing behavior (of getting updates in documents). > * Improve network resource usage and performance. Multiple connection to the > same cluster could share socket connections > * Handle rate limiting on target and source HTTP endpoints. Let replication > request auto-discover rate limit capacity based on a proven algorithm such as > Additive Increase / Multiplicative Decrease feedback control loop. > * Improve performance by avoiding repeatedly retrying failing replication > jobs. Instead use exponential backoff. > * Improve recovery from long (but temporary) network failure. Currently if > replications jobs fail to start 10 times in a row they will not be retried > anymore. This is not always desirable. In case of a long enough DNS (or other > network) failure replication jobs will effectively stop until they are > manually restarted. > * Better handling of filtered replications: Failing to fetch filters could > block couch replicator manager, lead to message queue backups and memory > exhaustion. Also, when replication filter code changes update replication > accordingly (replication job ID should change in that case) > * Provide better metrics to introspect replicator behavior. -- This message was sent by Atlassian JIRA (v6.3.15#6346)