[
https://issues.apache.org/jira/browse/COUCHDB-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000442#comment-14000442
]
Alexander Shorin commented on COUCHDB-2240:
-------------------------------------------
How much is "many"? 100, 200, 500, 1K, 10K, 1M?
> Many continuous replications cause DOS
> --------------------------------------
>
> Key: COUCHDB-2240
> URL: https://issues.apache.org/jira/browse/COUCHDB-2240
> Project: CouchDB
> Issue Type: Bug
> Security Level: public(Regular issues)
> Reporter: Eli Stevens
>
> Currently, I can configure an arbitrary number of replications between
> localhost DBs (in my case, they are in the _replicator DB with continuous set
> to true). However, there is a limit beyond which requests to the DB start to
> fail. Trying to do another replication fails with the error:
> ServerError: (500, ('checkpoint_commit_failure', "Target database out of
> sync. Try to increase max_dbs_open at the target's server."))
> Due to COUCHDB-2239, it's not clear what the actual issue is.
> I also believe that while the DB was in this state GET requests to documents
> were also failing, but the machine that has the logs of this has already had
> it's drives wiped. If need be, I can recreate the situation and provide those
> logs as well.
> I think that instead of there being a single fixed pool of resources that
> cause errors when exhausted, the system should have a per-task-type pool of
> resources that result in performance degradation when exhausted. N
> replication workers with P DB connections, and if that's not enough they
> start to round-robin; that sort of thing. When a user has too much to
> replicate, it gets slow instead of failing.
> As it stands now, I have a potentially large number of continuous
> replications that produce a fixed rate of data to replicate (because there's
> a fixed application worker pool that writes the data in the first place). We
> use a DB+replication per batch of data to process, and if we receive a burst
> of batches, then couchdb starts failing. The current setup means that I'm
> always going to be playing chicken between burst size and whatever setting
> limit we're hitting. That sucks, and isn't acceptable for a production
> system, so we're going to have to re-architect how we do replication, and
> basically implement poor-man's continuous by doing one off replications at
> various points of our data processing runs.
--
This message was sent by Atlassian JIRA
(v6.2#6252)