[Cassandra Wiki] Update of "ArchitectureOverview" by tu xracer69

Apache Wiki Sat, 14 Nov 2009 06:57:51 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "ArchitectureOverview" page has been changed by tuxracer69.
http://wiki.apache.org/cassandra/ArchitectureOverview?action=diff&rev1=1&rev2=2

--------------------------------------------------

  Architecture details
  
  
- O(1) node lookup Explicit replication Eventually consistent
+  * O(1) node lookup 
+  * Explicit replication 
+  * Eventually consistent
  
  
  
  
  
  Architecture layers
- Messaging service Gossip Failure detection Cluster state Partitioner 
Replication Commit log Memtable SSTable Indexes Compaction Tombstones Hinted 
handoff Read repair Bootstrap Monitoring Admin tools
  
- Writes
+ 
+  * Messaging service 
+  * Gossip 
+  * Failure detection 
+  * Cluster state 
+  * Partitioner 
+  * Replication 
+ 
+  * Commit log 
+  * Memtable 
+  * SSTable 
+  * Indexes 
+  * Compaction 
+ 
+  * Tombstones 
+  * Hinted handoff 
+  * Read repair 
+  * Bootstrap 
+  * Monitoring 
+  * Admin tools
+ 
+ == Writes ==
  
  
  Any node Partitioner Commitlog, memtable SSTable Compaction Wait for W 
responses
  
  
+ Write model:
  
+ There are two write modes:
+  * ''Quorum write'': blocks until quorum is reached
+  * ''Async write'': sends request to any node. That node will push the data 
to appropriate nodes but return to client immediately
  
  
+ If node down, then write to another node with a hint saying where it should 
be written two. Harvester every 15 min goes through and find hints and moves 
the data to the appropriate node
  
+ === Write path ===
+ At write time, 
+  * you first write to a '''disk commit log''' (sequential)
+  * After write to log it is sent to the appropriate nodes
+  * Each node receiving write first records it in a local log, then makes 
update to appropriate '''memtables''' (one for each column family). A Memtable 
is Cassandra's in-memory representation of key/value pairs
+ before the data gets flushed to disk as an SSTable.  
+  * '''Memtables''' are flushed to disk when:
+    * Out of space
+    * Too many keys (128 is default)
+    * Time duration (client provided – no cluster clock)
+  * When memtables written out two files go out:
+    * Data File ('''SSTable'''). A SSTable (terminology borrowed from Google) 
stands for Sorted Strings Table and is a file of key/value string pairs, sorted 
by keys.
+    * Index File ('''SSTable Index'''). (Similar to Hadoop !MapFile / Tfile)
+      * (Key, offset) pairs (points into data file)
+      * Bloom filter (all keys in data file)
+  * When a commit log has had all its column families pushed to disk, it is 
deleted
+  * '''Compaction''': Data files accumulate over time.  Periodically data 
files are merged sorted into a new file (and creates new index)
+    * Merge keys 
+    * Combine columns 
+    * Discard tombstones
  
  
  
  
  
+ == Remove ==
- Memtable / SSTable
- 
- Disk
- Commit log
- 
- SSTable format
- 
- 
- Key / data
- 
- SSTable Indexes
- 
- 
- Bloom filter Key Column
- 
- 
- 
- 
- 
- (Similar to Hadoop MapFile / Tfile)
- 
- Compaction
- 
- 
- Merge keys Combine columns Discard tombstones
- 
- 
- 
- 
- 
- Remove
  
  
  Deletion marker (tombstone) necessary to suppress data in older SSTables, 
until compaction Read repair complicates things a little Eventually consistent 
complicates things more Solution: configurable delay before tombstone GC, after 
which tombstones are not repaired
@@ -154, +171 @@

  
  
  
- Read path
+ == Read path ==
  
  
  Any node Partitioner Wait for R responses Wait for N  R responses in the 
background and perform read repair

[Cassandra Wiki] Update of "ArchitectureOverview" by tu xracer69

Reply via email to