Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.
The "FAQ" page has been changed by JonathanEllis. The comment on this change is: add "Why are reads slower than writes?". http://wiki.apache.org/cassandra/FAQ?action=diff&rev1=32&rev2=33 -------------------------------------------------- * [[#architecture|What are SSTables and Memtables?]] * [[#working_with_timeuuid_in_java|Why is it so hard to work with TimeUUIDType in Java?]] * [[#i_deleted_what_gives|I delete data from Cassandra, but disk usage stays the same. What gives?]] + * [[#reads_slower_writes|Why are reads slower than writes?]] <<Anchor(cant_listen_on_ip_any)>> == Why can't I make Cassandra listen on 0.0.0.0 (all my addresses)? == @@ -157, +158 @@ == I delete data from Cassandra, but disk usage stays the same. What gives? == Data you write to Cassandra gets persisted to SSTables. Since SSTables are immutable, the data can't actually be removed when you perform a delete, instead, a marker (also called a "tombstone") is written to indicate the value's new status. Never fear though, on the first compaction that occurs after ''GCGraceSeconds'' (hint: storage-conf.xml) have expired, the data will be expunged completely and the corresponding disk space recovered. + <<Anchor(reads_slower_writes)>> + == Why are reads slower than writes? == + Unlike all major relational databases and some NoSQL systems, Cassandra does not use b-trees and in-place updates on disk. Instead, it uses a sstable/memtable model like Bigtable's: writes to each ColumnFamily are grouped together in an in-memory structure before being flushed (sorted and written to disk). Thus, writes are extremely fast, costing only a commitlog append and an amortized sequential write for the flush. This means that writes cost no random I/O, compared to a b-tree system which not only has to seek to the data location to overwrite, but also may have to seek to read different levels of the index if it outgrows disk cache! + + The downside is that on a read, Cassandra has to (potentially) merge row fragments from multiple sstables on disk. We think this is a tradeoff worth making, first because scaling writes has always been harder than scaling reads, and second because as your data corpus grows Cassandra's read disadvantage narrows vs b-tree systems that have to do multiple seeks against a large index. +
