[Cassandra Wiki] Update of "CassandraLimitations" by JonathanEllis

Apache Wiki Mon, 31 Aug 2009 13:59:23 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The following page has been changed by JonathanEllis:
http://wiki.apache.org/cassandra/CassandraLimitations

------------------------------------------------------------------------------
  = Limitations =
  
- From easiest to fix to hardest:
+ == Inherent in the design ==
+ 
+ The main limitation on column and supercolumn size is that all data for a 
single key and column must fit (on disk) on a single machine in the cluster. 
Because keys alone are used to determine the nodes responsible for replicating 
their data, the amount of data associated with a single key has this upper 
bound. This is an inherent limitation of the distribution model.
+ 
+ == Artifacts of the current code base ==
  
   * Cassandra's compaction code currently deserializes an entire row (per 
columnfamily) at a time.  So all the data from a given columnfamily/key pair 
must fit in memory.  Fixing this is relatively easy since columns are stored 
in-order on disk so there is really no reason you have to deserialize 
row-at-a-time except that that is easier with the current encapsulation of 
functionality.
   * Cassandra has two levels of indexes: key and column.  But in super 
columnfamilies there is a third level of subcolumns; these are not indexed, and 
any request for a subcolumn deserializes _all_ the subcolumns in that 
supercolumn.  So you want to avoid a data model that requires large numbers of 
subcolumns.  This can be fixed; the core classes involved are SuperColumn and 
SequenceFile.

[Cassandra Wiki] Update of "CassandraLimitations" by JonathanEllis

Reply via email to