Replication slows down over time
--------------------------------

                 Key: COUCHDB-1230
                 URL: https://issues.apache.org/jira/browse/COUCHDB-1230
             Project: CouchDB
          Issue Type: Bug
          Components: Replication
    Affects Versions: 1.1, 1.0.2
         Environment: Ubuntu 10.04, 
            Reporter: Paul Hirst


I have two databases which were replicated in the past, one is running 1.0.2. I 
shall call this the source database. The other is running 1.1.0, I shall call 
this the target database.

The source and target are bidirectionally replicated using a push and pull 
replication from the target (using a couple of documents in the new _replicator 
database).

The source database is in production and is getting changes applied to it from 
live systems. The target is only participating in replication and it's being 
used directly by any production systems.

The database has about 50 million documents many of these will have been 
updated a handful of times. The database is about 500G after compaction, but 
the source database is currently at about 900G as it hasn't been compacted for 
a while.

The databases were replicated in the past however this replication was torn 
down when the target was upgraded from 1.0.2 to 1.1.0. When replication was 
reenabled the system wasn't able to pick up were it left off and had to 
reenumerate all the documents again. This process initially started quickly but 
after a while ground to a halt such that the target actually stopped making 
progress against the source database.

I found that restarting replication starts the process running again at a 
decent speed for a while. I did this by deleting and recreating the appropriate 
document in the _replicator database on the target.  

I have graphed the last_seq of the target database against time for about a 
day, noting when replication was manually restarted. I shall try to attach the 
graph if possible. It shows a clear improvement in replication speed after 
restarting replication.

I previously witnessed this behaviour between 1.0.2 databases but didn't grab 
any stats at the time but I don't think it's a new problem.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to