joshrodman opened a new issue #1329: Compaction deamon triggered repeatedly without reclaiming significant space URL: https://github.com/apache/couchdb/issues/1329 ## Expected Behavior If the automatic compaction daemon is triggered by exceeding the db_fragmentation threshhold, then it should be able to run to completion and reclaim enough space that it does not run again (until such fragmentation occurs again). ## Current Behavior I have set my defaults to a fairly aggressive setting of 20% db_fragmentation as a trigger to cause the compaction daemon to run, because my databases are sizable in proportion to the disks they reside on. What I am observing is that the formula described in documentation across the web [(file_size - data_size) / file_size * 100] is true both _before_ and _after_ compaction runs for a number of my databases. (Note: There appears to be glitch in couchdb documentation since there is no "file_size", only "disk_size", I am assuming this is an error in the documentation and what is meant is disk_size.) ## Possible Solution I use CouchDB's space-saving compression options, and I suspect that the ratio (that the compaction daemon uses as a threshhold) of disk_size and data_size is misleading the compaction daemon. Another possible source of trouble is if the two metrics are not counting the same piles of data, and I've got residue that cannot be "compacted away," introducing the opportunity for the ratio to never be satisfied by a compaction. ## Steps to Reproduce (for bugs) I don't know how to tell someone to reproduce this output, but as you can see from the following output from one of the larger databases I have (that compacts constantly, and is compacting now) disk_size is quite a bit larger than data_size. If you can reproduce such a database with the configuration shown, then I bet you would trigger this behavior. { "db_name": "mydatabase", "doc_count": 31252381, "doc_del_count": 0, "update_seq": 31269070, "purge_seq": 0, "compact_running": true, "disk_size": 26100674679, "data_size": 12123312190, "instance_start_time": "1526155173120000", "disk_format_version": 6, "committed_update_seq": 31269070 } _default = [{db_fragmentation, "20%"}, {view_fragmentation, "20%"}] ## Context Currently, I'm contemplating turning off the compaction daemon because it is constantly spinning its wheels. I may have to build my own mechanism to observe disk and data sizes and determine when it is best to compact, possibly after the disk_size has grown [x%] from a size previously saved at the last compaction. (Note: Such a compaction daemon option might be extremely useful: if every compaction saved the disk_size and compaction was only retriggered when disk_size grew by (configurable) [x%], that'd possibly dodge the issue I'm dealing with entirely, and possibly be an additional, complementary compaction algorithm to the existing algorithm.) ## Your Environment CouchDB 1.6.1 (used as a private application JSON cache, accessible only to a local application)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
