[Cassandra Wiki] Update of "CassandraLimitations" by Te dZ

Apache Wiki Mon, 01 Mar 2010 10:06:24 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "CassandraLimitations" page has been changed by TedZ.
The comment on this change is: mention random data vulnerability and clean up 
https link (should use HTTP unless SSL is really necessary).
http://wiki.apache.org/cassandra/CassandraLimitations?action=diff&rev1=6&rev2=7

--------------------------------------------------

  == Artifacts of the current code base ==
   * Cassandra's compaction code currently deserializes an entire row (per 
columnfamily) at a time.  So all the data from a given columnfamily/key pair 
must fit in memory.  Fixing this is relatively easy since columns are stored 
in-order on disk so there is really no reason you have to deserialize 
row-at-a-time except that that is easier with the current encapsulation of 
functionality.
   * Cassandra has two levels of indexes: key and column.  But in super 
columnfamilies there is a third level of subcolumns; these are not indexed, and 
any request for a subcolumn deserializes _all_ the subcolumns in that 
supercolumn.  So you want to avoid a data model that requires large numbers of 
subcolumns.
-  * Cassandra's public API is based on Thrift, which offers no streaming 
abilities -- any value written or fetched has to fit in memory.  This is 
inherent to Thrift's design; I don't see it changing.  So adding large object 
support to Cassandra would need a special API that manually split the large 
objects up into pieces.  Jonathan Ellis sketched out one approach in 
https://issues.apache.org/jira/browse/CASSANDRA-265.
+  * Cassandra's public API is based on Thrift, which offers no streaming 
abilities -- any value written or fetched has to fit in memory.  This is 
inherent to Thrift's design; I don't see it changing.  So adding large object 
support to Cassandra would need a special API that manually split the large 
objects up into pieces.  Jonathan Ellis sketched out one approach in 
http://issues.apache.org/jira/browse/CASSANDRA-265.
   * The byte[] size of a value can't be more than 2^31-1.
+ 
+ == Vulnerability to random data ==
+ 
+ Cassandra will crash if you send random data to the Thrift API.  See 
http://issues.apache.org/jira/browse/CASSANDRA-475 and 
http://issues.apache.org/jira/browse/THRIFT-601 for details.
  
  == Obsolete Limitations ==
   * Prior to version 0.4, Cassandra did not fsync the commitlog before acking 
a write.  Most of the time this is Good Enough when you are writing to multiple 
replicas since the odds are slim of all replicas dying before the data actually 
hits the disk, but the truly paranoid will want real fsync-before-ack.  This is 
now an option.

[Cassandra Wiki] Update of "CassandraLimitations" by Te dZ

Reply via email to