Isaac Z. Schlueter created COUCHDB-2102:
-------------------------------------------
Summary: Downstream replicator database bloat
Key: COUCHDB-2102
URL: https://issues.apache.org/jira/browse/COUCHDB-2102
Project: CouchDB
Issue Type: Bug
Security Level: public (Regular issues)
Components: Replication
Reporter: Isaac Z. Schlueter
When I do continuous replication from one db to another, I get a lot of bloat
over time.
For example, replicating a _users db with a relatively low level of writes, and
around 30,000 documents, the size on disk of the downstream replica was over
300MB after 2 weeks. I compacted the DB, and the size dropped to about 20MB
(slightly smaller than the source database).
Of course, I realize that I can configure compaction to happen regularly. But
this still seems like a rather excessive tax. It is especially shocking to
users who are replicating a 100GB database full of attachments, and find it
grow to 400GB if they're not careful! You can easily end up in a situation
where you don't have enough disk space to successfully compact.
Is there a fundamental reason why this happens? Or has it simply never been a
priority? It'd be awesome if replication were more efficient with disk space.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)