Eric Avdey created COUCHDB-2726:
-----------------------------------

             Summary: Remove a compression's over-optimization
                 Key: COUCHDB-2726
                 URL: https://issues.apache.org/jira/browse/COUCHDB-2726
             Project: CouchDB
          Issue Type: Improvement
      Security Level: public (Regular issues)
            Reporter: Eric Avdey


When a file compression set to snappy, couch is doing an additional 
optimization step by also compressing the term with deflate, comparing the 
sizes of the result binary and choosing the smaller one. This leads to a 
situation when "winning" deflated term got decompressed and compressed back on 
each document update, because deflate's compressed terms are not recognized 
with option file_compression set to snappy. This is done to allow migration 
from deflate to snappy.

However this optimization is a problem, because couch keeps field `body` in 
#doc record as 2 elements tuple of compressed body and compressed list of the 
attachments pointers. If the document doesn't have the attachments the pointers 
are an empty list which always compressed by deflate better than by snappy. In 
other words, if the option file_compression set to snappy almost every document 
in all databases goes through decompression\compression cycle on each write.

Basic test shows that this compression optimization on average saves less that 
one percent of the disk space, so it doesn't worth to trade this space for CPU 
cycles.

http://nbviewer.ipython.org/gist/eiri/79d91a797af9c6a6ff6d

I suggest to remove this optimization all together and just follow configured 
option for choosing the compression library.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to