I think sizes.active should be a close approximation of the size of the database after compaction; i.e. it should be possible to use (sizes.file - sizes.active) as a way to estimate the number of bytes that can be reclaimed by compacting that database shard.
Adam > On Oct 22, 2018, at 4:32 PM, Eiri <e...@eiri.ca> wrote: > > Dear all, > > I’d like to hear your opinion on how we should interpret a database attribute > “active size”. > > As you surely know we are using three different size attributes in a database > info: file - the size of the database file on disk; external - the > uncompressed size of database contents and active, defined as “the size of > live data inside the database” or “active byte in the current MVCC snapshot”. > > Sometime ago I had a discussion with Paul Davis and he pointed on ambiguity > of that definition, namely - is it live data before a compaction or after a > compaction? To put it in other words: should we treat as “active” only the > documents and attachments on btree’s leafs or also include into it the > previous document revisions while they can be accessed. Codewise it is the > latter, both in current version of CouchDB and in 1.x version where active > size was named data_size, but intuitively it feels that it should be former. > > Despite sounds academical this is a practical question, the difference of > active size before and after compaction could be rather noticeable and since > it is used as a trigger by compaction daemon it could skew disk usage pattern. > > Please share your thoughts. If we’ll conclude that we want to change how > active size calculated I’m willing to take on implementation of this as I > have a recent PR around the same area of code. > > > Regards, > Eric > > > > > > >