[jira] Updated: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

Filipe Manana (JIRA) Wed, 10 Feb 2010 15:42:57 -0800

     [ 
https://issues.apache.org/jira/browse/COUCHDB-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Filipe Manana updated COUCHDB-639:
----------------------------------

    Description: 
At the moment, for compressed attachments, the replication uncompresses and 
then compresses again the attachments. Therefore, a waste of CPU time.

The push replication is also not reliable for very large attachments (500mb + 
for example). Currently it sends the attachments in-lined in the respective 
JSON doc. Not only this requires too much ram memory, it also wastes too much 
CPU time doing the base64 encoding of the attachment (and also a decompression 
if the attachment is compressed).

The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both 
issues. Docs containing attachments are now streamed to the target remote DB 
using the multipart doc streaming feature provided by couch_doc.erl, and 
compressed attachments are not uncompressed and re-compressed during the 
replication

JavaScript tests included.

Previously doing a replication of a DB containing 2 docs with attachments of 
100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my 
system. With that patch applied, it uses about 130Mb of ram memory.



  was:
Currently, when doing a pull replication where there are docs at the source DB 
with compressed attachments:

1) The source decompresses the attachment before sending it to the target DB
2) The target compresses the attachment before storing it

Clearly, a waste of CPU, bandwidth and disk IO.

The following patch fixes the issue. JavaScript test included.

        Summary: Make replication profit of attachment compression and improve 
push replication for large attachments  (was: Pull replication should profit of 
compressed attachments for higher performance)

At the moment, for compressed attachments, the replication uncompresses and 
then compresses again the attachments. Therefore, a waste of CPU time.

The push replication is also not reliable for very large attachments (500mb + 
for example). Currently it sends the attachments in-lined in the respective 
JSON doc. Not only this requires too much ram memory, it also wastes too much 
CPU time doing the base64 encoding of the attachment (and also a decompression 
if the attachment is compressed).

The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both 
issues. Docs containing attachments are now streamed to the target remote DB 
using the multipart doc streaming feature provided by couch_doc.erl, and 
compressed attachments are not uncompressed and re-compressed during the 
replication

JavaScript tests included.

Previously doing a replication of a DB containing 2 docs with attachments of 
100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in my 
system. With that patch applied, it uses about 130Mb of ram memory.



> Make replication profit of attachment compression and improve push 
> replication for large attachments
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-639
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-639
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.11
>         Environment: trunk
>            Reporter: Filipe Manana
>         Attachments: pull-rep-att-comp-2.patch, pull-rep-att-comp.patch, 
> rep-att-comp-and-multipart-trunk.patch
>
>
> At the moment, for compressed attachments, the replication uncompresses and 
> then compresses again the attachments. Therefore, a waste of CPU time.
> The push replication is also not reliable for very large attachments (500mb + 
> for example). Currently it sends the attachments in-lined in the respective 
> JSON doc. Not only this requires too much ram memory, it also wastes too much 
> CPU time doing the base64 encoding of the attachment (and also a 
> decompression if the attachment is compressed).
> The following patch (rep-att-comp-and-multipart-trunk*.patch) addresses both 
> issues. Docs containing attachments are now streamed to the target remote DB 
> using the multipart doc streaming feature provided by couch_doc.erl, and 
> compressed attachments are not uncompressed and re-compressed during the 
> replication
> JavaScript tests included.
> Previously doing a replication of a DB containing 2 docs with attachments of 
> 100mb and 500mb caused the Erlang VM to consume near 1.2GB of ram memory in 
> my system. With that patch applied, it uses about 130Mb of ram memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-639) Make replication profit of attachment compression and improve push replication for large attachments

Reply via email to