Steve is right, and to try and add more clarification... > Interesting choice; the 7 core in a single CPU option is something else > to consider. Remember also this is a moving target, what anyone says is > valid now (Feb 2011) will be seen as quaint in two years time. Even a > few months from now, what is the best value for a cluster will hve moved on.
I've never saw a 7 core chip, 4,6, and now 8? (Cores not including hyper threading). The point Steve is making is that its true that the price point for picking the optimum hardware continues to move and what we see today doesn't mean we won't see a better optimal configuration. And more importantly, what is optimal for one user isn't going to be optimal for another. The other issue that we hadn't even talked about is if you want to go 'white box' and build your own, or your IT shop picks up the phone and calls Dell, HP, IBM, whomever supplies your hardware. That too will limit your options and affect your budget. In addition you have to look at what are realistic expectations. There are a lot of factors that you have to weigh when making hardware decisions, including how clean your developers code is going to be and what resources that they will require to run on your cloud. And if you run Hbase or Cloudbase, you add more variables. The key is finding out which combination of variables are going to be the most important for you to get the most out of your hardware. Ok... I'll get off my soapbox for now and go get my first cup of coffee. :-) -Mike > Date: Mon, 14 Feb 2011 11:23:13 +0000 > From: [email protected] > To: [email protected] > Subject: Re: recommendation on HDDs > > On 12/02/11 16:26, Michael Segel wrote: > > > > All, > > > > I'd like to clarify somethings... > > > > First the concept is to build out a cluster of commodity hardware. > > So when you do your shopping you want to get the most bang for your buck. > > That is the 'sweet spot' that I'm talking about. > > When you look at your E5500 or E5600 chip sets, you will want to go with 4 > > cores per CPU, dual CPU and a clock speed around 2.53GHz or so. > > (Faster chips are more expensive and the performance edge falls off so you > > end up paying a premium.) > > Interesting choice; the 7 core in a single CPU option is something else > to consider. Remember also this is a moving target, what anyone says is > valid now (Feb 2011) will be seen as quaint in two years time. Even a > few months from now, what is the best value for a cluster will hve moved on. > > > > > Looking at your disks, you start with using the on board SATA controller. > > Why? Because it means you don't have to pay for a controller card. > > If you are building a cluster for general purpose computing... Assuming 1U > > boxes you have room for 4 3.5" SATA which still give you the best > > performance for your buck. > > Can you go with 2.5"? Yes, but you are going to be paying a premium. > > > > Price wise, a 2TB SATA II 7200 RPM drive is going to be your best deal. You > > could go with SATA III drives if your motherboard supports the SATA III > > ports, but you're still paying a slight premium. > > > > The OP felt that all he would need was 1TB of disk and was considering 4 > > 250GB drives. (More spindles...yada yada yada...) > > > > My suggestion is to forget that nonsense and go with one 2 TB drive because > > its a better deal and if you want to add more disk to the node, you can. > > (Its easier to add disk than it is to replace it.) > > > > Now do you need to create a spare OS drive? No. Some people who have an > > internal 3.5 space sometimes do. That's ok, and you can put your hadoop > > logging there. (Just make sure you have a lot of disk space...) > > One advantage of a specific drive for OS and log (in a separate > partition) is you can re-image it without losing data you care about, > and swap in a replacement fast. If you have a small cluster set up for > hotswap, that reduces the time a node is down -just have a spare OS HDD > ready to put in. OS disks are the ones you care about when they fail, > the others are more "mildly concerned about the failure rate" than > something to page you over. > > > > > The truth is that there really isn't any single *right* answer. There are a > > lot of options and budget constraints as well as physical constraints like > > power, space, and location of the hardware. > > +1. don't forget weight either. > > > > > Also you may be building out a cluster who's main purpose is to be a backup > > location for your cluster. So your production cluster has lots of nodes. > > Your backup cluster has lots of disks per node because your main focus is > > as much storage per node. > > > > So here you may end up buying a 4U rack box, load it up with 3.5" drives > > and a couple of SATA controller cards. You care less about performance but > > more about storage space. Here you may say 3TB SATA drives w 12 or more per > > box. (I don't know how many you can fit in to a 4U chassis these days. So > > you have 10 DN backing up a 100+ DN cluster in your main data center. But > > that's another story. > > You can get 12 HDDs in a 1U if you ask nicely. but in a small cluster > there's a cost, that server can be a big chunk of your filesystem, and > if it goes down there's up to 24TB worth of replication going to take > place over the rest of the network, so you'll need at least 24TB of > spare capacity on the other machines, ignoring bandwidth issues. > > > > > I think the main take away you should have is that if you look at the price > > point... your best price per GB is on a 2TB drive until the prices drop on > > 3TB drives. > > Since the OP believes that their requirement is 1TB per node... a single > > 2TB would be the best choice. It allows for additional space and you really > > shouldn't be too worried about disk i/o being your bottleneck. > > > One less thing to worry about is good.
