[Cassandra Wiki] Update of "CassandraHardware" by Jonat hanEllis

Apache Wiki Thu, 19 Nov 2009 08:31:31 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "CassandraHardware" page has been changed by JonathanEllis.
http://wiki.apache.org/cassandra/CassandraHardware

--------------------------------------------------

New page:
=== Memory ===
The most recently written data resides in memory tables (aka 
[[MemtableThresholds|memtables]]), but older data that has been flushed to disk 
can be kept in the OS's file-system cache. In other words, ''the more memory, 
the better'', with 1GB being the minimum recommended.

=== CPU ===
Many workloads will actually be CPU-bound in Cassandra before being 
memory-bound.  Cassandra is highly concurrent and will make good use of however 
many cores you can give it.


=== Disk ===
The short answer here is, ''at least 2 disks'', one to keep your 
`CommitLogDirectory` on, the other to use in `DataFileDirectories`. The exact 
answer though depends a lot on your usage so it's important to understand what 
is going on here.

Cassandra persists data to disk for two very different purposes. The first, 
when a new write is made so that it can be replayed after a crash or system 
shutdown. The second when thresholds are exceeded and memtables are flushed to 
disk as SSTables.

Commit logs receive every write made to a Cassandra node and have the potential 
to block client operations, but they are only ever read on node start-up. 
SSTables writes on the other hand occur asynchronously, but are read to satisfy 
client look-ups. SSTables are also periodically merged and rewritten in a 
process called ''compaction''. Another important distinction is that commit 
logs are purged after the corresponding data has been flushed to disk as an 
SSTable, so `CommitLogDirectory` only holds uncommitted data while the 
directories in `DataFileDirectories` store all of the data written to a node.

So to summarize, use a different device for your `CommitLogDirectory`; it 
needn't be large, but it should be fast enough to receive all of your writes. 
Then, use one or more devices for `DataFileDirectories` and make sure they are 
both large enough to house all of your data, and fast enough to satisfy your 
reads and to keep up with flushing and compaction.

[Cassandra Wiki] Update of "CassandraHardware" by Jonat hanEllis

Reply via email to