Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.

The "ArchitectureOverview" page has been changed by tuxracer69.
http://wiki.apache.org/cassandra/ArchitectureOverview

--------------------------------------------------

New page:
(WORK IN PROGESS!)

This is an overview of Cassandra architecture aimed at Cassandra users.

Developers should probably look at the Developers links on the wiki's 
[[../|front page]]

Information is mainly based on 
[[http://assets.en.oreilly.com/1/event/27/Cassandra_%20Open%20Source%20Bigtable%20+%20Dynamo%20Presentation.pdf|J
 Ellis OSCON 09 presentation ]]


== Motivation ==


Scaling reads to a relational database is hard Scaling writes to a relational 
database is virtually impossible


... and when you do, it usually isn't relational anymore

* The new face of data


Scale out, not up Online load balancing, cluster growth Flexible schema 
Key-oriented queries CAP-aware


 * CAP theorem


Pick two of Consistency, Availability, Partition tolerance

Two famous papers

 * Bigtable: A distributed storage system for structured data, 2006 
 * Dynamo: amazon's highly available keyvalue store, 2007



Two approaches


 * Bigtable: "How can we build a distributed db on top of GFS?" 
 * Dynamo: "How can we build a distributed hash table appropriate for the data 
center?"



10,000 ft summary


 * Dynamo partitioning and replication 
 * Log-structured ColumnFamily data model similar to Bigtable's



Cassandra highlights


 * High availability 
 * Incremental scalability 
 * Eventually consistent 
 * Tunable tradeoffs between consistency and latency 
 * Minimal administration 
 * No SPF (Single Point of Failure)











Dynamo architecture & Lookup

Architecture details


O(1) node lookup Explicit replication Eventually consistent





Architecture layers
Messaging service Gossip Failure detection Cluster state Partitioner 
Replication Commit log Memtable SSTable Indexes Compaction Tombstones Hinted 
handoff Read repair Bootstrap Monitoring Admin tools

Writes


Any node Partitioner Commitlog, memtable SSTable Compaction Wait for W responses











Memtable / SSTable

Disk
Commit log

SSTable format


Key / data

SSTable Indexes


Bloom filter Key Column





(Similar to Hadoop MapFile / Tfile)

Compaction


Merge keys Combine columns Discard tombstones





Remove


Deletion marker (tombstone) necessary to suppress data in older SSTables, until 
compaction Read repair complicates things a little Eventually consistent 
complicates things more Solution: configurable delay before tombstone GC, after 
which tombstones are not repaired







Cassandra write properties


No reads No seeks Fast Atomic within ColumnFamily Always writable









Read path


Any node Partitioner Wait for R responses Wait for N ­ R responses in the 
background and perform read repair







Cassandra read properties


Read multiple SSTables Slower than writes (but still fast) Seeks can be 
mitigated with more RAM Scales to billions of rows







Consistency in a BASE world


If W + R > N, you will have consistency W=1, R=N W=N, R=1 W=Q, R=Q where Q = N 
/ 2 + 1







vs MySQL with 50GB of data


MySQL


~300ms write ~350ms read ~0.12ms write ~15ms read

Reply via email to