let us know how the SSD's pan out, I am curious about that as well cheers, jesse
-- jesse mcconnell jesse.mcconn...@gmail.com On Tue, Mar 9, 2010 at 12:08, B. Todd Burruss <bburr...@real.com> wrote: > our dataset is too big to fit into cache, so we are hitting disk. not a > problem for normal operation, but when a node is restored, hinted handoff, > load balanced, or if reads/write simply build up we see a problem. the > nodes can't seem to catch up. this seems to be centered around drive seek > time, not cassandra per se. > > to combat we are doing the following: > > - add more smaller drives per machine in RAID 0 to combat drive seek time. > - scale horizontally - add more machines to cluster to spread the load > - we also plan to try out SSDs as well. > > > Jonathan Ellis wrote: >> >> Yes, but I would guess 90% of workloads are better served with >> spending the extra money on more machines w/ cheap sata disks and lots >> of ram. >> >> -Jonathan >> >> On Sun, Mar 7, 2010 at 1:00 PM, Boris Shulman <shulm...@gmail.com> wrote: >> >>> >>> Do you think having SAS disks will give better performance? >>> >>> On Sat, Mar 6, 2010 at 5:47 PM, Jonathan Ellis <jbel...@gmail.com> wrote: >>> >>>> >>>> I think http://wiki.apache.org/cassandra/CassandraHardware answers >>>> most of your questions. >>>> >>>> If possible, it's definitely useful to try out a small fraction of >>>> your anticipated workload against a test cluster, even a single node, >>>> before finalizing your production hardware purchase. >>>> >>>> On Sat, Mar 6, 2010 at 1:12 AM, Rosenberry, Eric >>>> <eric.rosenbe...@iovation.com> wrote: >>>> >>>>> >>>>> I am looking for advice from others that are further along in deploying >>>>> Cassandra in production environments than we are. I want to know what >>>>> you >>>>> are finding your bottlenecks to be. I would feel silly purchasing dual >>>>> processor quad core 2.93ghz Nehalem machines with 192 gigs of RAM just >>>>> to >>>>> find out that the two local SATA disks kept all that CPU and RAM from >>>>> being >>>>> useful (clearly that example would be a dumb). >>>>> >>>>> >>>>> >>>>> I need to spec out hardware for an “optimal” Cassandra node (though our >>>>> read/write characteristics are not yet fully defined so let’s go with >>>>> an >>>>> “average” configuration). >>>>> >>>>> >>>>> >>>>> My main concern is finding the right balance of: >>>>> >>>>> · Available CPU >>>>> >>>>> · RAM amount >>>>> >>>>> · RAM speed (think Nehalem architecture where memory comes in a >>>>> few >>>>> speeds, though I doubt this is much of a concern as it is mainly >>>>> dictated by >>>>> which processor you buy and how many slots you populate) >>>>> >>>>> · Total iops available (i.e. number of disks) >>>>> >>>>> · Total disk space available (depending on the ratio of >>>>> iops/space >>>>> deciding on SAS vs. SATA and various rotational speeds) >>>>> >>>>> >>>>> >>>>> My current thinking is 1U boxes with four 3.5 inch disks since that >>>>> seems to >>>>> be a readily available config. One big question is should I go with a >>>>> single processor Nehalem system to go with those four disks, or would >>>>> two >>>>> CPU’s be useful, and also, how much RAM is appropriate to match? I am >>>>> making the assumption that Cassandra nodes are going to be disk bound >>>>> as >>>>> they must do a random read to answer any given query (i.e. indexes in >>>>> RAM, >>>>> but all data lives on disk?). >>>>> >>>>> >>>>> >>>>> The other big decision is what type of hard disks others are finding to >>>>> provide the optimal ratio of iops to available space? SAS or SATA? >>>>> And >>>>> what rotational speed? >>>>> >>>>> >>>>> >>>>> Let me throw out here an actual hardware config and feel free to tell >>>>> me the >>>>> error of my ways: >>>>> >>>>> · A SuperMicro SuperServer 6016T-NTRF configured as follows: >>>>> >>>>> o 2.26 ghz E5520 dual processor quad core hyperthreaded Nehalem >>>>> architecture (this proc provides a lot of bang for the buck, faster >>>>> procs >>>>> get more expensive quickly) >>>>> >>>>> o Qty 12, 4 gig 1066mhz DIMMS for a total of 48 gigs RAM (the 4 gig >>>>> DIMMS >>>>> seem to be the price sweet spot) >>>>> >>>>> o Dual on board 1 gigabit NIC’s (perhaps one for client connections >>>>> and >>>>> the other for cluster communication?) >>>>> >>>>> o Dual power supplies (I don’t want to lose half my cluster due to a >>>>> failure on one power leg) >>>>> >>>>> o 4x 1TB SATA disks (this is a complete SWAG) >>>>> >>>>> o No RAID controller (all just single individual disks presented to >>>>> the >>>>> OS) – Though is there any down side to using a RAID controller with >>>>> RAID 0 >>>>> (perhaps one single disk for the log for sequential io’s, and 3x disks >>>>> in a >>>>> stripe for the random io’s) >>>>> >>>>> o The on-board IPMI based OOB controller (so we can kick the boxes >>>>> remotely if need be) >>>>> >>>>> · >>>>> http://www.supermicro.com/products/system/1U/6016/SYS-6016T-NTRF.cfm >>>>> >>>>> >>>>> >>>>> I can’t help but think the above config has way too much RAM and CPU >>>>> and not >>>>> enough iops capacity. My understanding is that Cassandra does not >>>>> cache >>>>> much in RAM though? >>>>> >>>>> >>>>> >>>>> Any thoughts are appreciated. Thanks. >>>>> >>>>> >>>>> >>>>> -Eric >>>>> >>>>> _______________________________________________________________ >>>>> Eric Rosenberry >>>>> Sr. Infrastructure Architect | Chief Bit Plumber >>>>> >>>>> >>>>> >>>>> >>>>> iovation >>>>> 111 SW Fifth Avenue >>>>> Suite 3200 >>>>> Portland, OR 97204 >>>>> www.iovation.com >>>>> >>>>> The information contained in this email message may be privileged, >>>>> confidential and protected from disclosure. If you are not the intended >>>>> recipient, any dissemination, distribution or copying is strictly >>>>> prohibited. If you think that you have received this email message in >>>>> error, >>>>> please notify the sender by reply email and delete the message and any >>>>> attachments. >>>>> >