Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.
The following page has been changed by JonathanEllis: http://wiki.apache.org/cassandra/CassandraLimitations ------------------------------------------------------------------------------ = Limitations = - From easiest to fix to hardest: + == Inherent in the design == + + The main limitation on column and supercolumn size is that all data for a single key and column must fit (on disk) on a single machine in the cluster. Because keys alone are used to determine the nodes responsible for replicating their data, the amount of data associated with a single key has this upper bound. This is an inherent limitation of the distribution model. + + == Artifacts of the current code base == * Cassandra's compaction code currently deserializes an entire row (per columnfamily) at a time. So all the data from a given columnfamily/key pair must fit in memory. Fixing this is relatively easy since columns are stored in-order on disk so there is really no reason you have to deserialize row-at-a-time except that that is easier with the current encapsulation of functionality. * Cassandra has two levels of indexes: key and column. But in super columnfamilies there is a third level of subcolumns; these are not indexed, and any request for a subcolumn deserializes _all_ the subcolumns in that supercolumn. So you want to avoid a data model that requires large numbers of subcolumns. This can be fixed; the core classes involved are SuperColumn and SequenceFile.
