Eric Avdey created COUCHDB-2726:
-----------------------------------
Summary: Remove a compression's over-optimization
Key: COUCHDB-2726
URL: https://issues.apache.org/jira/browse/COUCHDB-2726
Project: CouchDB
Issue Type: Improvement
Security Level: public (Regular issues)
Reporter: Eric Avdey
When a file compression set to snappy, couch is doing an additional
optimization step by also compressing the term with deflate, comparing the
sizes of the result binary and choosing the smaller one. This leads to a
situation when "winning" deflated term got decompressed and compressed back on
each document update, because deflate's compressed terms are not recognized
with option file_compression set to snappy. This is done to allow migration
from deflate to snappy.
However this optimization is a problem, because couch keeps field `body` in
#doc record as 2 elements tuple of compressed body and compressed list of the
attachments pointers. If the document doesn't have the attachments the pointers
are an empty list which always compressed by deflate better than by snappy. In
other words, if the option file_compression set to snappy almost every document
in all databases goes through decompression\compression cycle on each write.
Basic test shows that this compression optimization on average saves less that
one percent of the disk space, so it doesn't worth to trade this space for CPU
cycles.
http://nbviewer.ipython.org/gist/eiri/79d91a797af9c6a6ff6d
I suggest to remove this optimization all together and just follow configured
option for choosing the compression library.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)