There is no such thing as a column or supercolumn that is not contained in a ColumnFamily. The ColumnFamily is the structure that is stored together on disk.
A supercolumn is not what you think it is: supercolumns are like regular columns, except they contain other columns, and you can have an almost infinite number of supercolumns within a SuperColumnFamily. A ColumnFamily is layed out on disk as a sequence of values which is sorted by key, then by (super)column name (or column timestamp), then subcolumn name/timestamp. Therefore, it is very fast to get contiguous keys from the ColumnFamily, but to get a single column name from multiple keys Cassandra still needs to seek to the next interesting column on disk. There is no concept of 'blocks' in the Cassandra representation, because it does not use a B-Tree to store data. There is an index for each ColumnFamily on disk that allows Cassandra to seek directly to a key in the sorted file. Please see http://wiki.apache.org/cassandra/DataModel Thanks, Stu -----Original Message----- From: "Ivan Chang" <[email protected]> Sent: Wednesday, July 1, 2009 3:00pm To: [email protected] Subject: How does Cassandra store data physically? I am wondering how Cassandra stores its columns, super columns in the database files? A supercolumn logically groups a set of related columns together, when the supercolumn is written to file, are the columns also stored in adjacent blocks to each other so IO cost is minimized for related data? What about individual columns not associated with any supercolumn, but related only through a given key? Thanks, Ivan
