[jira] [Commented] (COUCHDB-2102) Downstream replicator database bloat

Alexander Shorin (JIRA) Fri, 07 Mar 2014 14:36:12 -0800

    [ 
https://issues.apache.org/jira/browse/COUCHDB-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924418#comment-13924418
 ]


Alexander Shorin commented on COUCHDB-2102:
-------------------------------------------

Fedor Indutny (@indutny on IRC) recently reported about the same behavior using 
1.5.0 release and filtered replication on npm registry. While actual data is 
under 300MB, db disk size grew up to 81GB:

{code}
{
  "committed_update_seq": 224,
  "disk_format_version": 6,
  "instance_start_time": "1393936995838019",
  "db_name": "yandex-packages",
  "doc_count": 208,
  "doc_del_count": 0,
  "update_seq": 224,
  "purge_seq": 0,
  "compact_running": false,
  "disk_size": 81703006328,
  "data_size": 279752269
}
{code}

I'd failed to reproduce this, but seems to be this is not the local anomaly. 

Could you, [~isaacs], [~terinjokes] and everyone else, provide some more 
information about your environment. It would be also awesome if you can provide 
any database file which suffered from this bug to let us investigate in what is 
that bloat data there.

> Downstream replicator database bloat
> ------------------------------------
>
>                 Key: COUCHDB-2102
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-2102
>             Project: CouchDB
>          Issue Type: Bug
>      Security Level: public(Regular issues) 
>          Components: Replication
>            Reporter: Isaac Z. Schlueter
>
> When I do continuous replication from one db to another, I get a lot of bloat 
> over time.
> For example, replicating a _users db with a relatively low level of writes, 
> and around 30,000 documents, the size on disk of the downstream replica was 
> over 300MB after 2 weeks.  I compacted the DB, and the size dropped to about 
> 20MB (slightly smaller than the source database).
> Of course, I realize that I can configure compaction to happen regularly.  
> But this still seems like a rather excessive tax.  It is especially shocking 
> to users who are replicating a 100GB database full of attachments, and find 
> it grow to 400GB if they're not careful!  You can easily end up in a 
> situation where you don't have enough disk space to successfully compact.
> Is there a fundamental reason why this happens?  Or has it simply never been 
> a priority?  It'd be awesome if replication were more efficient with disk 
> space.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (COUCHDB-2102) Downstream replicator database bloat

Reply via email to