Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.

The "LargeDataSetConsiderations" page has been changed by jeremyhanna:
https://wiki.apache.org/cassandra/LargeDataSetConsiderations?action=diff&rev1=24&rev2=25

  ==== Other points to consider: ====
  
   * Disk space usage in Cassandra can vary over time:
-   * Compaction: with the !SizeTieredCompactionStrategy, compaction can up to 
double the disk space used.  With the !LeveledCompactionStrategy, usually only 
requires about 10% overhead (see 
http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra).
+   * Compaction: with the !SizeTieredCompactionStrategy, compaction can up to 
double the disk space used.  The !LeveledCompactionStrategy usually only 
requires about 10% overhead (see 
http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra).
    * Repair: repair operations can increase disk space demands, see 
http://www.datastax.com/dev/blog/advanced-repair-techniques for details and how 
it can be improved.
   * Consider the choice of file system. Removal of large files is notoriously 
slow and seek bound on e.g. ext2/ext3. Consider xfs or ext4fs. This affects 
background unlink():ing of sstables that happens every now and then, and also 
affects start-up time (if there are sstables pending removal when a node is 
starting up, they are removed as part of the start-up procees; it may thus be 
detrimental if removing a terabyte of sstables takes an hour (numbers are 
ballparks, not accurately measured and depends on circumstances)).
   * Adding nodes is a slow process if each node is responsible for a large 
amount of data. Plan for this; do not try to throw additional hardware at a 
cluster at the last minute.

Reply via email to