I submitted a jira ticket for something similar, but not nearly so low-level. By breaking a .couch file into a series of files where each is of modest size, we could compact piecemeal. Once one of the files has less than, say, 50% valid records, the remaining records can be read and written to the tail file, and the old file can be deleted. Berkeley JE works this way, files are ordered by their names (dbname-00000000.couch, say). A compactor runs continuously, monitoring the used space in each file.
B. On Tue, Dec 22, 2009 at 5:40 PM, Damien Katz <[email protected]> wrote: > > On Dec 22, 2009, at 11:56 AM, Chris Anderson wrote: > >> On Mon, Dec 21, 2009 at 2:20 PM, Damien Katz <[email protected]> wrote: >>> I saw recently some issues people where having with compaction, and I >>> thought I'd get some thoughts down about ways to improve the compaction >>> code/experience. >>> >>> 1. Multi-process pipeline processing. Similar to the enhancements to the >>> view indexing, there is opportunities for pipelining operations instead of >>> the current read/write batch operations it does. This can reduce memory >>> usage and make compaction faster. >>> 2. Multiple disks/mount points. CouchDB could easily have 2 or more >>> database dirs, and each time it compacts, it copies the new database file >>> to another dir/disk/mountpoint. For servers with multiple disks this will >>> greatly smooth the copying as the disk heads won't need to seek between >>> reads and writes. >>> 3. Better compaction algorithms. There are all sorts of clever things that >>> could be done to make the compaction faster. Right now it rebuilds the >>> database in a similar manner as if it would if it clients were bulk >>> updating it. This was the simplest way to do it, but certainly not the >>> fastest. There are a lot of ways to make this much more efficient, they >>> just take more work. >>> 4. Tracking wasted space. This can be used to determine threshold for >>> compaction. We don't need to track with 100% accuracy how much disk space >>> is being wasted, but it would be a big improvement to at least know how >>> much disk space the raw docs take, and maybe calculate an estimate of the >>> indexes necessary to support them in a freshly compacted database. >>> 5. Better Low level file driver support. Because we are using the Erlang >>> built-in file system drivers, we don't have access to a lot of flags. If we >>> had our own drivers, one option we'd like to use is to not OS cache the >>> reads and write during the compaction, it's unnecessary for compaction and >>> it could completely consume the cache with rarely accessed data, evicting >>> lots of recently used live data, greatly hurting performance of other >>> databases. >>> >>> Anyway, just getting these thoughts out. More ideas and especially code >>> welcome. >>> >>> -Damien >> >> Another thing worth considering, is that if we get block alignment >> right, then our copy-to-a-new-file compaction could end up working as >> compact-in-place on content-addressable filesystems. Most of the >> blocks won't change content, so the FS can just write new pointers to >> existing blocks, and then garbage collect unneeded blocks later. If we >> get the block alignment right... > > I think that requires rearrangeing the blocks which means working below the > FS system level (using what is essentially our own file system), or using a > file system that exposes the raw file block mgmt (does such a thing exist?). > Would be cool though. > > >> >> Chris >> >> -- >> Chris Anderson >> http://jchrisa.net >> http://couch.io > >
