[jira] Commented: (COUCHDB-416) Replicating shards into a single aggregation node may cause endless respawning

Adam Kocoloski (JIRA) Fri, 14 Aug 2009 19:22:43 -0700

    [ 
https://issues.apache.org/jira/browse/COUCHDB-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743523#action_12743523
 ]


Adam Kocoloski commented on COUCHDB-416:
----------------------------------------

Hi Enda, good sleuthing.  Trunk now throws a db_not_found exception if the 
source and/or target DB does not exist.  We should probably clean up the error 
message that gets propagated to the client, but now it won't respawn like mad 
and blow up logs.

I'm fairly certain that the missing DB was the problem.  Multiple sources 
replicating to the same target should work just fine.  If you give the OK I'll 
close this ticket.

> Replicating shards into a single aggregation node may cause endless respawning
> ------------------------------------------------------------------------------
>
>                 Key: COUCHDB-416
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-416
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.9
>         Environment: couchdb 0.9.0.r766883 CentOS x86_64
>            Reporter: Enda Farrell
>            Assignee: Adam Kocoloski
>            Priority: Critical
>         Attachments: Picture 2.png
>
>
> I have a set of CouchDB instances, each one acting as a shard for a large set 
> of data.
> Ocassionally, we replicate each instances' database into a different CouchDB 
> instance. We always "pull" replicate (see image attached)
> When we do this, we often see errors like this on the target instance:
> * [Thu, 16 Jul 2009 13:52:32 GMT] [error] [emulator] Error in process 
> <0.29787.102> with exit value: 
> {function_clause,[{lists,map,[#Fun<couch_rep.6.75683565>,undefined]},{couch_rep,enum_docs_since,4}]}
> * 
> * 
> * 
> * [Thu, 16 Jul 2009 13:52:32 GMT] [error] [<0.7456.6>] replication enumerator 
> exited with {function_clause,
> *                                     [{lists,map,
> *                                       
> [#Fun<couch_rep.6.75683565>,undefined]},
> *                                      {couch_rep,enum_docs_since,4}]} .. 
> respawning
> Once this starts, it is fatal to the CouchDB instance. It logs these messages 
> at over 1000 per second (log level = severe) and chews up HDD.
> No errors (other than a HTTP timeout) are seen.
> After a database had gone "respawning",  the target node was shutdown, logs 
> cleared, target node restarted. Log was tailed - all was quiet. Once a single 
> replication was called again against this database it again immediatly went 
> into respawning hell. There were no stacked replications in this case.
> From this it seems that - if a database ever goes into "respawning" it cannot 
> recover (when your enviroment/setup requires replication to occur always).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-416) Replicating shards into a single aggregation node may cause endless respawning

Reply via email to