[Cassandra Wiki] Update of "CassandraLimitations" by Ni ck Pavlica

Apache Wiki Mon, 29 Mar 2010 14:11:30 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "CassandraLimitations" page has been changed by Nick Pavlica.
http://wiki.apache.org/cassandra/CassandraLimitations?action=diff&rev1=8&rev2=9

--------------------------------------------------

   * The byte[] size of a value can't be more than 2^31-1.
   * Cassandra's compaction code currently deserializes an entire row (per 
columnfamily) at a time.  So all the data from a given columnfamily/key pair 
must fit in memory.  Fixing this is relatively easy since columns are stored 
in-order on disk so there is really no reason you have to deserialize 
row-at-a-time except that that is easier with the current encapsulation of 
functionality.  This will be fixed in 
https://issues.apache.org/jira/browse/CASSANDRA-16
   * Cassandra has two levels of indexes: key and column.  But in super 
columnfamilies there is a third level of subcolumns; these are not indexed, and 
any request for a subcolumn deserializes _all_ the subcolumns in that 
supercolumn.  So you want to avoid a data model that requires large numbers of 
subcolumns.  https://issues.apache.org/jira/browse/CASSANDRA-598 is open to 
remove this limitation.
-  * Cassandra's public API is based on Thrift, which offers no streaming 
abilities -- any value written or fetched has to fit in memory.  This is 
inherent to Thrift's design; I don't see it changing.  So adding large object 
support to Cassandra would need a special API that manually split the large 
objects up into pieces.  Jonathan Ellis sketched out one approach in 
http://issues.apache.org/jira/browse/CASSANDRA-265.  As a workaround in the 
meantime, you can manually split files into chunks of whatever size you are 
comfortable with -- at least one person is using 64MB -- and making a file 
correspond to a row, with the chunks as column values.
+  * <<Anchor(streaming)>>Cassandra's public API is based on Thrift, which 
offers no streaming abilities -- any value written or fetched has to fit in 
memory.  This is inherent to Thrift's design; I don't see it changing.  So 
adding large object support to Cassandra would need a special API that manually 
split the large objects up into pieces.  Jonathan Ellis sketched out one 
approach in http://issues.apache.org/jira/browse/CASSANDRA-265.  As a 
workaround in the meantime, you can manually split files into chunks of 
whatever size you are comfortable with -- at least one person is using 64MB -- 
and making a file correspond to a row, with the chunks as column values.
   * Thrift will crash Cassandra if you send random or malicious data to it.  
This makes exposing the Cassandra port directly to the outside internet a Bad 
Idea.  See http://issues.apache.org/jira/browse/CASSANDRA-475 and 
http://issues.apache.org/jira/browse/THRIFT-601 for details.
  
  == Obsolete Limitations ==

[Cassandra Wiki] Update of "CassandraLimitations" by Ni ck Pavlica

Reply via email to