[jira] [Commented] (COUCHDB-2102) Downstream replicator database bloat

Terin Stock (JIRA) Sun, 09 Mar 2014 10:42:06 -0700

    [ 
https://issues.apache.org/jira/browse/COUCHDB-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13925263#comment-13925263
 ]


Terin Stock commented on COUCHDB-2102:
--------------------------------------

While I also can't give you the database file, being over 150 GB in space, I 
can share the configuration.

1. Rather standard compile of CouchDB 1.5
2. local.ini configuration as such:

{code:title=local.ini}
[couch_httpd_auth]
public_fields = appdotnet, avatar, avatarMedium, avatarLarge, date, email, 
fields, freenode, fullname, github, homepage, name, roles, twitter, type, _id, 
_rev
users_db_public = true

[httpd]
secure_rewrites = false

[couchdb]
delayed_commits = false
{code}

3. Setup replication with

{code}
  curl -X POST http://localhost:5984/_replicator \
    -d 
'{"_id":"fullfatdb","source":"https://fullfatdb.npmjs.com/registry","target":"registry","continuous":true,"user_ctx":{"name":"admin","roles":["_admin"]}}'
 \
    -H "Content-Type: application/json"
{code}

> Downstream replicator database bloat
> ------------------------------------
>
>                 Key: COUCHDB-2102
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-2102
>             Project: CouchDB
>          Issue Type: Bug
>      Security Level: public(Regular issues) 
>          Components: Replication
>            Reporter: Isaac Z. Schlueter
>
> When I do continuous replication from one db to another, I get a lot of bloat 
> over time.
> For example, replicating a _users db with a relatively low level of writes, 
> and around 30,000 documents, the size on disk of the downstream replica was 
> over 300MB after 2 weeks.  I compacted the DB, and the size dropped to about 
> 20MB (slightly smaller than the source database).
> Of course, I realize that I can configure compaction to happen regularly.  
> But this still seems like a rather excessive tax.  It is especially shocking 
> to users who are replicating a 100GB database full of attachments, and find 
> it grow to 400GB if they're not careful!  You can easily end up in a 
> situation where you don't have enough disk space to successfully compact.
> Is there a fundamental reason why this happens?  Or has it simply never been 
> a priority?  It'd be awesome if replication were more efficient with disk 
> space.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (COUCHDB-2102) Downstream replicator database bloat

Reply via email to