Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.
The "ArchitectureOverview" page has been changed by tuxracer69. http://wiki.apache.org/cassandra/ArchitectureOverview?action=diff&rev1=6&rev2=7 -------------------------------------------------- * Data File ('''SSTable'''). A SSTable (terminology borrowed from Google) stands for Sorted Strings Table and is a file of key/value string pairs, sorted by keys. * Index File ('''SSTable Index'''). (Similar to Hadoop !MapFile / Tfile) * (Key, offset) pairs (points into data file) - * Bloom filter (all keys in data file) + * '''Bloom filter''' (all keys in data file). A [[http://en.wikipedia.org/wiki/Bloom_filter|Bloom filter]], is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. Cassandra uses bloom filters to save IO when performing a key lookup: each SSTable has a bloom filter associated with it that Cassandra checks before doing any disk seeks, making queries for keys that don't exist almost free. Bloom filters are surprisingly simple: divide a memory area into buckets (one bit per bucket for a standard bloom filter; more -typically four - for a counting bloom filter). To insert a key, generate several hashes per key, and mark the buckets for each hash. To check if a key is present, check each bucket; if any bucket is empty, the key was never inserted in the filter. If all buckets are non-empty, though, the key is only probably inserted - other keys' hashes could have covered the same buckets. See [[http://spyced.blogspot.com/2009/01/all-you-ever-wanted-to-know-about.html|All you ever wanted to know about writing bloom filters]] for details and in particular why getting a really good output distribution is important. + + + * When a commit log has had all its column families pushed to disk, it is deleted * '''Compaction''': Data files accumulate over time. Periodically data files are merged sorted into a new file (and creates new index) * Merge keys
