[ 
https://issues.apache.org/jira/browse/COUCHDB-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15989489#comment-15989489
 ] 

ASF subversion and git services commented on COUCHDB-3324:
----------------------------------------------------------

Commit 25054365a0a198d829a7414f8c0c10e0a5ac6651 in couchdb's branch 
refs/heads/63012-scheduler from [~sagelywizard]
[ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=2505436 ]

Share connections between replications

This commit adds functionality to share connections between
replications. This is to solve two problems:

- Prior to this commit, each replication would create a pool of
  connections and hold onto those connections as long as the replication
  existed. This was wasteful and cause CouchDB to use many unnecessary
  connections.
- When the pool was being terminated, the pool would block while the
  socket was closed. This would cause the entire replication scheduler
  to block. By reusing connections, connections are never closed by
  clients. They are only ever relinquished. This operation is always
  fast.

This commit adds an intermediary process which tracks which connection
processes are being used by which client. It monitors clients and
connections. If a client or connection crashes, the paired
client/connection will be terminated.

A client can gracefully relinquish ownership of a connection. If that
happens, the connection will be shared with another client. If the
connection remains idle for too long, it will be closed.

Jira: COUCHDB-3324


> Scheduling Replicator
> ---------------------
>
>                 Key: COUCHDB-3324
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-3324
>             Project: CouchDB
>          Issue Type: New Feature
>            Reporter: Nick Vatamaniuc
>
> Improve CouchDB replicator
>  * Allow running a large number of replication jobs
>  * Improve API with a focus on ease of use and performance. Avoid updating 
> replication document with transient state updates. Instead create a proper 
> API for querying replication states. At the same time provide a compatibility 
> mode to let users keep existing behavior (of getting updates in documents).
>  * Improve network resource usage and performance. Multiple connection to the 
> same cluster could share socket connections
>  * Handle rate limiting on target and source HTTP endpoints. Let replication 
> request auto-discover rate limit capacity based on a proven algorithm such as 
> Additive Increase / Multiplicative Decrease feedback control loop.
>  * Improve performance by avoiding repeatedly retrying failing replication 
> jobs. Instead use exponential backoff. 
>  * Improve recovery from long (but temporary) network failure. Currently if 
> replications jobs fail to start 10 times in a row they will not be retried 
> anymore. This is not always desirable. In case of a long enough DNS (or other 
> network) failure replication jobs will effectively stop until they are 
> manually restarted.
>  * Better handling of filtered replications: Failing to fetch filters could 
> block couch replicator manager, lead to message queue backups and memory 
> exhaustion. Also, when replication filter code changes update replication 
> accordingly (replication job ID should change in that case)
>  * Provide better metrics to introspect replicator behavior.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to