There is no such thing as a column or supercolumn that is not contained in a 
ColumnFamily. The ColumnFamily is the structure that is stored together on disk.

A supercolumn is not what you think it is: supercolumns are like regular 
columns, except they contain other columns, and you can have an almost infinite 
number of supercolumns within a SuperColumnFamily.

A ColumnFamily is layed out on disk as a sequence of values which is sorted by 
key, then by (super)column name (or column timestamp), then subcolumn 
name/timestamp. Therefore, it is very fast to get contiguous keys from the 
ColumnFamily, but to get a single column name from multiple keys Cassandra 
still needs to seek to the next interesting column on disk.

There is no concept of 'blocks' in the Cassandra representation, because it 
does not use a B-Tree to store data. There is an index for each ColumnFamily on 
disk that allows Cassandra to seek directly to a key in the sorted file.

Please see http://wiki.apache.org/cassandra/DataModel

Thanks,
Stu

-----Original Message-----
From: "Ivan Chang" <[email protected]>
Sent: Wednesday, July 1, 2009 3:00pm
To: [email protected]
Subject: How does Cassandra store data physically?

I am wondering how Cassandra stores its columns, super columns in the
database files?

A supercolumn logically groups a set of related columns together, when the
supercolumn is written to file, are the columns also stored in adjacent
blocks to each other so IO cost is minimized for related data?  What about
individual columns not associated with any supercolumn, but related only
through a given key?

Thanks,
Ivan


Reply via email to