I think http://wiki.apache.org/cassandra/CassandraHardware answers most of your questions.
If possible, it's definitely useful to try out a small fraction of your anticipated workload against a test cluster, even a single node, before finalizing your production hardware purchase. On Sat, Mar 6, 2010 at 1:12 AM, Rosenberry, Eric <eric.rosenbe...@iovation.com> wrote: > I am looking for advice from others that are further along in deploying > Cassandra in production environments than we are. I want to know what you > are finding your bottlenecks to be. I would feel silly purchasing dual > processor quad core 2.93ghz Nehalem machines with 192 gigs of RAM just to > find out that the two local SATA disks kept all that CPU and RAM from being > useful (clearly that example would be a dumb). > > > > I need to spec out hardware for an “optimal” Cassandra node (though our > read/write characteristics are not yet fully defined so let’s go with an > “average” configuration). > > > > My main concern is finding the right balance of: > > · Available CPU > > · RAM amount > > · RAM speed (think Nehalem architecture where memory comes in a few > speeds, though I doubt this is much of a concern as it is mainly dictated by > which processor you buy and how many slots you populate) > > · Total iops available (i.e. number of disks) > > · Total disk space available (depending on the ratio of iops/space > deciding on SAS vs. SATA and various rotational speeds) > > > > My current thinking is 1U boxes with four 3.5 inch disks since that seems to > be a readily available config. One big question is should I go with a > single processor Nehalem system to go with those four disks, or would two > CPU’s be useful, and also, how much RAM is appropriate to match? I am > making the assumption that Cassandra nodes are going to be disk bound as > they must do a random read to answer any given query (i.e. indexes in RAM, > but all data lives on disk?). > > > > The other big decision is what type of hard disks others are finding to > provide the optimal ratio of iops to available space? SAS or SATA? And > what rotational speed? > > > > Let me throw out here an actual hardware config and feel free to tell me the > error of my ways: > > · A SuperMicro SuperServer 6016T-NTRF configured as follows: > > o 2.26 ghz E5520 dual processor quad core hyperthreaded Nehalem > architecture (this proc provides a lot of bang for the buck, faster procs > get more expensive quickly) > > o Qty 12, 4 gig 1066mhz DIMMS for a total of 48 gigs RAM (the 4 gig DIMMS > seem to be the price sweet spot) > > o Dual on board 1 gigabit NIC’s (perhaps one for client connections and > the other for cluster communication?) > > o Dual power supplies (I don’t want to lose half my cluster due to a > failure on one power leg) > > o 4x 1TB SATA disks (this is a complete SWAG) > > o No RAID controller (all just single individual disks presented to the > OS) – Though is there any down side to using a RAID controller with RAID 0 > (perhaps one single disk for the log for sequential io’s, and 3x disks in a > stripe for the random io’s) > > o The on-board IPMI based OOB controller (so we can kick the boxes > remotely if need be) > > · > http://www.supermicro.com/products/system/1U/6016/SYS-6016T-NTRF.cfm > > > > I can’t help but think the above config has way too much RAM and CPU and not > enough iops capacity. My understanding is that Cassandra does not cache > much in RAM though? > > > > Any thoughts are appreciated. Thanks. > > > > -Eric > > _______________________________________________________________ > Eric Rosenberry > Sr. Infrastructure Architect | Chief Bit Plumber > > > > > iovation > 111 SW Fifth Avenue > Suite 3200 > Portland, OR 97204 > www.iovation.com > > The information contained in this email message may be privileged, > confidential and protected from disclosure. If you are not the intended > recipient, any dissemination, distribution or copying is strictly > prohibited. If you think that you have received this email message in error, > please notify the sender by reply email and delete the message and any > attachments.