[Cassandra Wiki] Update of "FAQ" by JonathanEllis

Apache Wiki Mon, 25 Jan 2010 15:06:49 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "FAQ" page has been changed by JonathanEllis.
The comment on this change is: add "Why are reads slower than writes?".
http://wiki.apache.org/cassandra/FAQ?action=diff&rev1=32&rev2=33

--------------------------------------------------

   * [[#architecture|What are SSTables and Memtables?]]
   * [[#working_with_timeuuid_in_java|Why is it so hard to work with 
TimeUUIDType in Java?]]
   * [[#i_deleted_what_gives|I delete data from Cassandra, but disk usage stays 
the same. What gives?]]
+  * [[#reads_slower_writes|Why are reads slower than writes?]]
  
  <<Anchor(cant_listen_on_ip_any)>>
  == Why can't I make Cassandra listen on 0.0.0.0 (all my addresses)? ==
@@ -157, +158 @@

  == I delete data from Cassandra, but disk usage stays the same. What gives? ==
  Data you write to Cassandra gets persisted to SSTables. Since SSTables are 
immutable, the data can't actually be removed when you perform a delete, 
instead, a marker (also called a "tombstone") is written to indicate the 
value's new status. Never fear though, on the first compaction that occurs 
after ''GCGraceSeconds'' (hint: storage-conf.xml) have expired, the data will 
be expunged completely and the corresponding disk space recovered.
  
+ <<Anchor(reads_slower_writes)>>
+ == Why are reads slower than writes? ==
+ Unlike all major relational databases and some NoSQL systems, Cassandra does 
not use b-trees and in-place updates on disk.  Instead, it uses a 
sstable/memtable model like Bigtable's: writes to each ColumnFamily are grouped 
together in an in-memory structure before being flushed (sorted and written to 
disk).  Thus, writes are extremely fast, costing only a commitlog append and an 
amortized sequential write for the flush.  This means that writes cost no 
random I/O, compared to a b-tree system which not only has to seek to the data 
location to overwrite, but also may have to seek to read different levels of 
the index if it outgrows disk cache!  
+ 
+ The downside is that on a read, Cassandra has to (potentially) merge row 
fragments from multiple sstables on disk.  We think this is a tradeoff worth 
making, first because scaling writes has always been harder than scaling reads, 
and second because as your data corpus grows Cassandra's read disadvantage 
narrows vs b-tree systems that have to do multiple seeks against a large index.
+

[Cassandra Wiki] Update of "FAQ" by JonathanEllis

Reply via email to