[ 
https://issues.apache.org/jira/browse/COUCHDB-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977602#comment-15977602
 ] 

ASF subversion and git services commented on COUCHDB-3324:
----------------------------------------------------------

Commit 87f4ca0454909466b068d9c10c11467a3e85d3cb in couchdb's branch 
refs/heads/63012-scheduler from [~vatamane]
[ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=87f4ca0 ]

Implement replication document processor

Document processor listens for `_replicator` db document updates, parses those
changes then tries to add replication jobs to the scheduler.

Listening for changes happens in `couch_multidb_changes module`. That module is
generic and is set up to listen to shards with `_replicator` suffix by
`couch_replicator_db_changes`. Updates are then passed to the document
processor's `process_change/2` function.

Document replication ID calculation, which can involve fetching filter code
from the source DB, and addition to the scheduler, is done in a separate
worker process: `couch_replicator_doc_processor_worker`.

Before couch replicator manager did most of this work. There are a few
improvement over previous implementation:

 * Invalid (malformed) replication documents are immediately failed and will
 not be continuously retried.

 * Replication manager message queue backups is unfortunately a common issue
 in production. This is because processing document updates is a serial
 (blocking)  operation. Most of that blocking code was moved to separate worker
 processes.

 * Failing filter fetches have an exponential backoff.

 * Replication documents don't have to be deleted first then re-added in order
 update the replication. Document processor on update will compare new and
 previous replication related document fields and update the replication job
 if those changed. Users can freely update unlrelated (custom) fields in their
 replication docs.

 * In case of filtered replications using custom functions, document processor
 will periodically check if filter code on the source has changed. Filter code
 contents is factored into replication ID calculation. If filter code changes
 replication ID will change as well.

Jira: COUCHDB-3324


> Scheduling Replicator
> ---------------------
>
>                 Key: COUCHDB-3324
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-3324
>             Project: CouchDB
>          Issue Type: New Feature
>            Reporter: Nick Vatamaniuc
>
> Improve CouchDB replicator
>  * Allow running a large number of replication jobs
>  * Improve API with a focus on ease of use and performance. Avoid updating 
> replication document with transient state updates. Instead create a proper 
> API for querying replication states. At the same time provide a compatibility 
> mode to let users keep existing behavior (of getting updates in documents).
>  * Improve network resource usage and performance. Multiple connection to the 
> same cluster could share socket connections
>  * Handle rate limiting on target and source HTTP endpoints. Let replication 
> request auto-discover rate limit capacity based on a proven algorithm such as 
> Additive Increase / Multiplicative Decrease feedback control loop.
>  * Improve performance by avoiding repeatedly retrying failing replication 
> jobs. Instead use exponential backoff. 
>  * Improve recovery from long (but temporary) network failure. Currently if 
> replications jobs fail to start 10 times in a row they will not be retried 
> anymore. This is not always desirable. In case of a long enough DNS (or other 
> network) failure replication jobs will effectively stop until they are 
> manually restarted.
>  * Better handling of filtered replications: Failing to fetch filters could 
> block couch replicator manager, lead to message queue backups and memory 
> exhaustion. Also, when replication filter code changes update replication 
> accordingly (replication job ID should change in that case)
>  * Provide better metrics to introspect replicator behavior.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to