Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.
The "LargeDataSetConsiderations" page has been changed by jeremyhanna: https://wiki.apache.org/cassandra/LargeDataSetConsiderations?action=diff&rev1=24&rev2=25 ==== Other points to consider: ==== * Disk space usage in Cassandra can vary over time: - * Compaction: with the !SizeTieredCompactionStrategy, compaction can up to double the disk space used. With the !LeveledCompactionStrategy, usually only requires about 10% overhead (see http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra). + * Compaction: with the !SizeTieredCompactionStrategy, compaction can up to double the disk space used. The !LeveledCompactionStrategy usually only requires about 10% overhead (see http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra). * Repair: repair operations can increase disk space demands, see http://www.datastax.com/dev/blog/advanced-repair-techniques for details and how it can be improved. * Consider the choice of file system. Removal of large files is notoriously slow and seek bound on e.g. ext2/ext3. Consider xfs or ext4fs. This affects background unlink():ing of sstables that happens every now and then, and also affects start-up time (if there are sstables pending removal when a node is starting up, they are removed as part of the start-up procees; it may thus be detrimental if removing a terabyte of sstables takes an hour (numbers are ballparks, not accurately measured and depends on circumstances)). * Adding nodes is a slow process if each node is responsible for a large amount of data. Plan for this; do not try to throw additional hardware at a cluster at the last minute.
