On 07/13/2012 08:00 PM, Michael Theroux wrote:
Hello,

I've been trying to understand in greater detail how SStables are stored, and 
how information is transferred between Cassandra nodes, especially when a new 
node is joining a cluster.

Specifically, Is information stored to SStables ordered by rowkeys?  Some of 
the articles I've read suggests this is the case (although it's a little vague 
if they actually mean that the columns are stored in order, not the rowkeys).  
However, if data is stored in rowkey order, how is this achieved, as sstables 
are immutable?

Thanks for any insights,
-Mike

It depends on what partitioner you use. You should be using the RandomPartitioner, and if so, the rows are sorted by the hash of the row key. there are partitioners that sort based on the raw key value but these partitioners shouldn't be used as they have problems due to uneven partitioning of data.

As for how this is done, remember an sstable doesn't hold all the data for a column family. Not only does the data for a column family exist on multiple servers, there are usually multiple sstable files on disk that represent data from one column family on one machine. So at the time the sstable is written, the rows that are to be put in the sstable are sorted, and written in sorted order. In fact the same rowkey may be written in multiple sstables, one sstable having one set of columns for the key, the other sstable having other columns for the same key.

On query for some row based on a key, cassandra is responsible for finding where the columns are found in which sstables (potentially several) and merging the results.

Reply via email to