[GitHub] joshrodman opened a new issue #1329: Compaction deamon triggered repeatedly without reclaiming significant space

GitBox Mon, 14 May 2018 18:11:06 -0700

joshrodman opened a new issue #1329: Compaction deamon triggered repeatedly 
without reclaiming significant space
URL: https://github.com/apache/couchdb/issues/1329
 
 
   ## Expected Behavior
   If the automatic compaction daemon is triggered by exceeding the 
db_fragmentation threshhold, then it should be able to run to completion and 
reclaim enough space that it does not run again (until such fragmentation 
occurs again).
   
   ## Current Behavior
   I have set my defaults to a fairly aggressive setting of 20% 
db_fragmentation as a trigger to cause the compaction daemon to run, because my 
databases are sizable in proportion to the disks they reside on. What I am 
observing is that the formula described in documentation across the web 
[(file_size - data_size) / file_size * 100] is true both _before_ and _after_ 
compaction runs for a number of my databases. 
   
   (Note: There appears to be glitch in couchdb documentation since there is no 
"file_size", only "disk_size", I am assuming this is an error in the 
documentation and what is meant is disk_size.)
   
   ## Possible Solution
   I use CouchDB's space-saving compression options, and I suspect that the 
ratio (that the compaction daemon uses as a threshhold) of disk_size and 
data_size is misleading the compaction daemon. Another possible source of 
trouble is if the two metrics are not counting the same piles of data, and I've 
got residue that cannot be "compacted away," introducing the opportunity for 
the ratio to never be satisfied by a compaction.
   
   ## Steps to Reproduce (for bugs)
   I don't know how to tell someone to reproduce this output, but as you can 
see from the following output from one of the larger databases I have (that 
compacts constantly, and is compacting now) disk_size is quite a bit larger 
than data_size. If you can reproduce such a database with the configuration 
shown, then I bet you would trigger this behavior.
   
   {
       "db_name": "mydatabase",
       "doc_count": 31252381,
       "doc_del_count": 0,
       "update_seq": 31269070,
       "purge_seq": 0,
       "compact_running": true,
       "disk_size": 26100674679,
       "data_size": 12123312190,
       "instance_start_time": "1526155173120000",
       "disk_format_version": 6,
       "committed_update_seq": 31269070
   }
   
   _default = [{db_fragmentation, "20%"}, {view_fragmentation, "20%"}]
   
   ## Context
   Currently, I'm contemplating turning off the compaction daemon because it is 
constantly spinning its wheels. I may have to build my own mechanism to observe 
disk and data sizes and determine when it is best to compact, possibly after 
the disk_size has grown [x%] from a size previously saved at the last 
compaction.
   
   (Note: Such a compaction daemon option might be extremely useful: if every 
compaction saved the disk_size and compaction was only retriggered when 
disk_size grew by (configurable) [x%], that'd possibly dodge the issue I'm 
dealing with entirely, and possibly be an additional, complementary compaction 
algorithm to the existing algorithm.)
   
   ## Your Environment
   CouchDB 1.6.1 (used as a private application JSON cache, accessible only to 
a local application)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] joshrodman opened a new issue #1329: Compaction deamon triggered repeatedly without reclaiming significant space

Reply via email to