[
https://issues.apache.org/jira/browse/COUCHDB-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14642611#comment-14642611
]
ASF GitHub Bot commented on COUCHDB-2726:
-----------------------------------------
Github user eiri closed the pull request at:
https://github.com/apache/couchdb-couch/pull/61
> Remove a compression's over-optimization
> ----------------------------------------
>
> Key: COUCHDB-2726
> URL: https://issues.apache.org/jira/browse/COUCHDB-2726
> Project: CouchDB
> Issue Type: Improvement
> Security Level: public(Regular issues)
> Reporter: Eric Avdey
> Assignee: Eric Avdey
> Priority: Minor
>
> When a file compression set to snappy, couch is doing an additional
> optimization step by also compressing the term with deflate, comparing the
> sizes of the result binary and choosing the smaller one. This leads to a
> situation when "winning" deflated term got decompressed and compressed back
> on each document update, because deflate's compressed terms are not
> recognized with option file_compression set to snappy. This is done to allow
> migration from deflate to snappy.
> However this optimization is a problem, because couch keeps field `body` in
> #doc record as 2 elements tuple of compressed body and compressed list of the
> attachments pointers. If the document doesn't have the attachments the
> pointers are an empty list which always compressed by deflate better than by
> snappy. In other words, if the option file_compression set to snappy almost
> every document in all databases goes through decompression\compression cycle
> on each write.
> Basic test shows that this compression optimization on average saves less
> that one percent of the disk space, so it doesn't worth to trade this space
> for CPU cycles.
> http://nbviewer.ipython.org/gist/eiri/79d91a797af9c6a6ff6d
> I suggest to remove this optimization all together and just follow configured
> option for choosing the compression library.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)