[
https://issues.apache.org/jira/browse/COUCHDB-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628289#comment-14628289
]
ASF subversion and git services commented on COUCHDB-2726:
----------------------------------------------------------
Commit 3a26ea1ba09e50da3b97b64e6e1ebf75c9406202 in couchdb-couch's branch
refs/heads/master from [~eiri]
[ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch.git;h=3a26ea1 ]
Remove compression's optimization
When a file compression set to snappy, couch is doing an additional
optimization step by also compressing the term with deflate,
comparing the sizes of the result binary and choosing the smaller one.
This leads to a situation when for snappy compresed database the
'winning' deflate compressed term got decompressed and compressed
back into deflate on each document's write.
This patch removes this compression's optimization.
[Basic test](http://nbviewer.ipython.org/gist/eiri/79d91a797af9c6a6ff6d)
demonstrate that the gained with it disk space is not significant
enough to justify empty CPU cycles.
This closes COUCHDB-2726
> Remove a compression's over-optimization
> ----------------------------------------
>
> Key: COUCHDB-2726
> URL: https://issues.apache.org/jira/browse/COUCHDB-2726
> Project: CouchDB
> Issue Type: Improvement
> Security Level: public(Regular issues)
> Reporter: Eric Avdey
> Assignee: Eric Avdey
> Priority: Minor
>
> When a file compression set to snappy, couch is doing an additional
> optimization step by also compressing the term with deflate, comparing the
> sizes of the result binary and choosing the smaller one. This leads to a
> situation when "winning" deflated term got decompressed and compressed back
> on each document update, because deflate's compressed terms are not
> recognized with option file_compression set to snappy. This is done to allow
> migration from deflate to snappy.
> However this optimization is a problem, because couch keeps field `body` in
> #doc record as 2 elements tuple of compressed body and compressed list of the
> attachments pointers. If the document doesn't have the attachments the
> pointers are an empty list which always compressed by deflate better than by
> snappy. In other words, if the option file_compression set to snappy almost
> every document in all databases goes through decompression\compression cycle
> on each write.
> Basic test shows that this compression optimization on average saves less
> that one percent of the disk space, so it doesn't worth to trade this space
> for CPU cycles.
> http://nbviewer.ipython.org/gist/eiri/79d91a797af9c6a6ff6d
> I suggest to remove this optimization all together and just follow configured
> option for choosing the compression library.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)