[jira] [Commented] (COUCHDB-2240) Many continuous replications cause DOS

Alexander Shorin (JIRA) Fri, 16 May 2014 17:18:13 -0700

    [ 
https://issues.apache.org/jira/browse/COUCHDB-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000442#comment-14000442
 ]


Alexander Shorin commented on COUCHDB-2240:
-------------------------------------------

How much is "many"? 100, 200, 500, 1K, 10K, 1M?

> Many continuous replications cause DOS
> --------------------------------------
>
>                 Key: COUCHDB-2240
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-2240
>             Project: CouchDB
>          Issue Type: Bug
>      Security Level: public(Regular issues) 
>            Reporter: Eli Stevens
>
> Currently, I can configure an arbitrary number of replications between 
> localhost DBs (in my case, they are in the _replicator DB with continuous set 
> to true). However, there is a limit beyond which requests to the DB start to 
> fail.  Trying to do another replication fails with the error:
> ServerError: (500, ('checkpoint_commit_failure', "Target database out of 
> sync. Try to increase max_dbs_open at the target's server."))
> Due to COUCHDB-2239, it's not clear what the actual issue is. 
> I also believe that while the DB was in this state GET requests to documents 
> were also failing, but the machine that has the logs of this has already had 
> it's drives wiped. If need be, I can recreate the situation and provide those 
> logs as well.
> I think that instead of there being a single fixed pool of resources that 
> cause errors when exhausted, the system should have a per-task-type pool of 
> resources that result in performance degradation when exhausted. N 
> replication workers with P DB connections, and if that's not enough they 
> start to round-robin; that sort of thing. When a user has too much to 
> replicate, it gets slow instead of failing.
> As it stands now, I have a potentially large number of continuous 
> replications that produce a fixed rate of data to replicate (because there's 
> a fixed application worker pool that writes the data in the first place). We 
> use a DB+replication per batch of data to process, and if we receive a burst 
> of batches, then couchdb starts failing. The current setup means that I'm 
> always going to be playing chicken between burst size and whatever setting 
> limit we're hitting.  That sucks, and isn't acceptable for a production 
> system, so we're going to have to re-architect how we do replication, and 
> basically implement poor-man's continuous by doing one off replications at 
> various points of our data processing runs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (COUCHDB-2240) Many continuous replications cause DOS

Reply via email to