Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.

The "LargeDataSetConsiderations" page has been changed by jeremyhanna:
https://wiki.apache.org/cassandra/LargeDataSetConsiderations?action=diff&rev1=25&rev2=26

   * Consider the choice of file system. Removal of large files is notoriously 
slow and seek bound on e.g. ext2/ext3. Consider xfs or ext4fs. This affects 
background unlink():ing of sstables that happens every now and then, and also 
affects start-up time (if there are sstables pending removal when a node is 
starting up, they are removed as part of the start-up procees; it may thus be 
detrimental if removing a terabyte of sstables takes an hour (numbers are 
ballparks, not accurately measured and depends on circumstances)).
   * Adding nodes is a slow process if each node is responsible for a large 
amount of data. Plan for this; do not try to throw additional hardware at a 
cluster at the last minute.
   * The operating system's page cache is affected by compaction and repair 
operations. If you are relying on the page cache to keep the active set in 
memory, you may see significant degradation on performance as a result of 
compaction and repair operations.  See the cassandra.yaml for settings to 
reduce this impact.
-  * The partition (or sampled) index entries for each sstable can start to add 
up.  You can reduce the memory usage by tuning the interval that it samples at. 
 The setting is index_interval the cassandra.yaml.  See the comments there for 
more information.
+  * The partition (or sampled) index entries for each sstable can start to add 
up.  You can reduce the memory usage by tuning the interval that it samples at. 
 The setting is index_interval in cassandra.yaml.  See the comments there for 
more information.
  
  Other references to improvements:
   * 
[[http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2|Performance
 improvements in Cassandra 1.2]]

Reply via email to