Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.

The "CassandraHardware" page has been changed by JonathanEllis.
http://wiki.apache.org/cassandra/CassandraHardware

--------------------------------------------------

New page:
=== Memory ===
The most recently written data resides in memory tables (aka 
[[MemtableThresholds|memtables]]), but older data that has been flushed to disk 
can be kept in the OS's file-system cache. In other words, ''the more memory, 
the better'', with 1GB being the minimum recommended.

=== CPU ===
Many workloads will actually be CPU-bound in Cassandra before being 
memory-bound.  Cassandra is highly concurrent and will make good use of however 
many cores you can give it.


=== Disk ===
The short answer here is, ''at least 2 disks'', one to keep your 
`CommitLogDirectory` on, the other to use in `DataFileDirectories`. The exact 
answer though depends a lot on your usage so it's important to understand what 
is going on here.

Cassandra persists data to disk for two very different purposes. The first, 
when a new write is made so that it can be replayed after a crash or system 
shutdown. The second when thresholds are exceeded and memtables are flushed to 
disk as SSTables.

Commit logs receive every write made to a Cassandra node and have the potential 
to block client operations, but they are only ever read on node start-up. 
SSTables writes on the other hand occur asynchronously, but are read to satisfy 
client look-ups. SSTables are also periodically merged and rewritten in a 
process called ''compaction''. Another important distinction is that commit 
logs are purged after the corresponding data has been flushed to disk as an 
SSTable, so `CommitLogDirectory` only holds uncommitted data while the 
directories in `DataFileDirectories` store all of the data written to a node.

So to summarize, use a different device for your `CommitLogDirectory`; it 
needn't be large, but it should be fast enough to receive all of your writes. 
Then, use one or more devices for `DataFileDirectories` and make sure they are 
both large enough to house all of your data, and fast enough to satisfy your 
reads and to keep up with flushing and compaction.

Reply via email to