Track used space of database and view index files
-------------------------------------------------

                 Key: COUCHDB-1132
                 URL: https://issues.apache.org/jira/browse/COUCHDB-1132
             Project: CouchDB
          Issue Type: New Feature
          Components: Database Core
            Reporter: Filipe Manana
             Fix For: 1.2


Currently users have no reliable way to know if a database or view index 
compaction is needed.

Both me, Adam and Robert Dionne have been working on a feature to compute and 
expose the current data size (in bytes) of databases and view indexes. These 
computations are exposed as a single field in the database info and view index 
info URIs.

Comparing this new value with the disk_size value (the total space in bytes 
used by the database or view index file) would allow users to decide whether or 
not it's worth to trigger a compaction.

Adam and Robert's work can be found at:

https://github.com/cloudant/bigcouch/compare/7d1adfa...a9410e6

Mine can be found at:

https://github.com/fdmanana/couchdb/compare/file_space

After chatting with Adam on IRC, the main difference seems to be that they're 
work accounts only for user data (document bodies + attachments), while mine 
also accounts for the btree values (including all meta information, keys, rev 
trees, etc) and the data added by couch_file (4 bytes length prefix, md5s, 
block boundary markers).

An example:

$ curl http://localhost:5984/btree_db/_design/test/_info
{"name":"test","view_index":{"signature":"aba9f066ed7f042f63d245ce0c7d870e","language":"javascript","disk_size":274556,"data_size":90742,"updater_running":false,"compact_running":false,"waiting_commit":false,"waiting_clients":0,"update_seq":1004,"purge_seq":0}}

$ curl http://localhost:5984/btree_db
{"db_name":"btree_db","doc_count":1004,"doc_del_count":0,"update_seq":1004,"purge_seq":0,"compact_running":false,"disk_size":6197361,"data_size":6186460,"instance_start_time":"1303231080936421","disk_format_version":5,"committed_update_seq":1004}

This example was executed just after compacting the test database and view 
index. The new filed "data_size" has a value very close to the final file size.

The only thing that my branch doesn't include in the data_size computation, for 
databases, are the size of the last header, the size of the _security object 
and purged revs list - in practice these are very small and insignificant that 
adding extra code to account them doesn't seem worth it.

I'm sure we can merge the best from both branches.

Adam, Robert, thoughts?


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to