[jira] [Commented] (COUCHDB-2240) Many continuous replications cause DOS

Eli Stevens (JIRA) Fri, 16 May 2014 20:59:31 -0700

    [ 
https://issues.apache.org/jira/browse/COUCHDB-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000535#comment-14000535
 ]


Eli Stevens commented on COUCHDB-2240:
--------------------------------------

Depends on your settings. My understanding of what's causing the issue is that 
the default value of max_open_dbs is 100. I can raise that value arbitrarily 
high and hope that I never need to process a burst of activity greater than my 
arbitrarily high value, but that's not really desirable either, since I would 
much prefer to have an arbitrarily high number of replications going that only 
use a fixed pool of resources (and if those resources end up taxed, then 
performance degrades).

I understand that this is not a small change that I'm suggesting.

> Many continuous replications cause DOS
> --------------------------------------
>
>                 Key: COUCHDB-2240
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-2240
>             Project: CouchDB
>          Issue Type: Bug
>      Security Level: public(Regular issues) 
>            Reporter: Eli Stevens
>
> Currently, I can configure an arbitrary number of replications between 
> localhost DBs (in my case, they are in the _replicator DB with continuous set 
> to true). However, there is a limit beyond which requests to the DB start to 
> fail.  Trying to do another replication fails with the error:
> ServerError: (500, ('checkpoint_commit_failure', "Target database out of 
> sync. Try to increase max_dbs_open at the target's server."))
> Due to COUCHDB-2239, it's not clear what the actual issue is. 
> I also believe that while the DB was in this state GET requests to documents 
> were also failing, but the machine that has the logs of this has already had 
> it's drives wiped. If need be, I can recreate the situation and provide those 
> logs as well.
> I think that instead of there being a single fixed pool of resources that 
> cause errors when exhausted, the system should have a per-task-type pool of 
> resources that result in performance degradation when exhausted. N 
> replication workers with P DB connections, and if that's not enough they 
> start to round-robin; that sort of thing. When a user has too much to 
> replicate, it gets slow instead of failing.
> As it stands now, I have a potentially large number of continuous 
> replications that produce a fixed rate of data to replicate (because there's 
> a fixed application worker pool that writes the data in the first place). We 
> use a DB+replication per batch of data to process, and if we receive a burst 
> of batches, then couchdb starts failing. The current setup means that I'm 
> always going to be playing chicken between burst size and whatever setting 
> limit we're hitting.  That sucks, and isn't acceptable for a production 
> system, so we're going to have to re-architect how we do replication, and 
> basically implement poor-man's continuous by doing one off replications at 
> various points of our data processing runs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (COUCHDB-2240) Many continuous replications cause DOS

Reply via email to