[
https://issues.apache.org/jira/browse/COUCHDB-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alex Markham updated COUCHDB-1505:
----------------------------------
Attachment: replicationcancelerror1.log
couchjs.txt
> Error on cancelling replication - possbily related to hanging replications
> --------------------------------------------------------------------------
>
> Key: COUCHDB-1505
> URL: https://issues.apache.org/jira/browse/COUCHDB-1505
> Project: CouchDB
> Issue Type: Bug
> Components: Replication
> Affects Versions: 1.2
> Environment: CentOS 5.6 x64. WAN replication (between datacentres).
> Cronjob controlled replication curls every 5 mins. Using pull replication
> with a filter.
> Reporter: Alex Markham
> Labels: cancel, hang, replication
> Attachments: couchjs.txt, replicationcancelerror1.log
>
>
> We run a cronjob to cancel replication, and then start it again every 5
> minutes. Occasionally when cancelling replication jobs, a stack trace appears
> in the couchdb log (attached)
> Other observations : perhaps unrelated, but over time we slowly start to
> gather "zombie" couchjs processes. After a month or so (different for each
> server) we start to get up to near our os_process_limit of 200 and we restart
> couchdb. "zombie" is speculation here, but there seems to be no need for the
> hundred+ couchjs processes when just replicating 10 databases and occasional
> indexing, after restart it drops right back down. The started time of those
> processes are also weeks old. This may be normal, not sure.
> Why do we cancel replication and restart it? We found that if we don't do
> this then WAN replications can hang, where curling /_replicate would say that
> the continuous replication is already running, but that the replications were
> not updating, and the document counts in the databases would diverge.
> Immediately after re-enabling the "cancel":true /_replicate beforehand, these
> stack traces re-appeared and the replication caught up.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira