Bandwidth is definitely better with more active spindles. I would recommend several larger disks. The cost is very nearly the same.
On Fri, Feb 11, 2011 at 3:52 PM, Shrinivas Joshi <[email protected]>wrote: > Thanks for your inputs, Michael. We have 6 open SATA ports on the > motherboards. That is the reason why we are thinking of 4 to 5 data disks > and 1 OS disk. > Are you suggesting use of one 2TB disk instead of four 500GB disks lets > say? > I thought that the HDFS utilization/throughput increases with the # of > disks > per node (assuming that the total usable IO bandwidth increases > proportionally). > > -Shrinivas > > On Thu, Feb 10, 2011 at 4:25 PM, Michael Segel <[email protected] > >wrote: > > > > > Shrinivas, > > > > Assuming you're in the US, I'd recommend the following: > > > > Go with 2TB 7200 SATA hard drives. > > (Not sure what type of hardware you have) > > > > What we've found is that in the data nodes, there's an optimal > > configuration that balances price versus performance. > > > > While your chasis may hold 8 drives, how many open SATA ports are on the > > motherboard? Since you're using JBOD, you don't want the additional > expense > > of having to purchase a separate controller card for the additional > drives. > > > > I'm running Seagate drives at home and I haven't had any problems for > > years. > > When you look at your drive, you need to know total storage, speed > (rpms), > > and cache size. > > Looking at Microcenter's pricing... 2TB 3.0GB SATA Hitachi was $110.00 A > > 1TB Seagate was 70.00 > > A 250GB SATA drive was $45.00 > > > > So 2TB = 110, 140, 180 (respectively) > > > > So you get a better deal on 2TB. > > > > So if you go out and get more drives but of lower density, you'll end up > > spending more money and use more energy, but I doubt you'll see a real > > performance difference. > > > > The other thing is that if you want to add more disk, you have room to > > grow. (Just add more disk and restart the node, right?) > > If all of your disk slots are filled, you're SOL. You have to take out > the > > box, replace all of the drives, then add to cluster as 'new' node. > > > > Just my $0.02 cents. > > > > HTH > > > > -Mike > > > > > Date: Thu, 10 Feb 2011 15:47:16 -0600 > > > Subject: Re: recommendation on HDDs > > > From: [email protected] > > > To: [email protected] > > > > > > Hi Ted, Chris, > > > > > > Much appreciate your quick reply. The reason why we are looking for > > smaller > > > capacity drives is because we are not anticipating a huge growth in > data > > > footprint and also read somewhere that larger the capacity of the > drive, > > > bigger the number of platters in them and that could affect drive > > > performance. But looks like you can get 1TB drives with only 2 > platters. > > > Large capacity drives should be OK for us as long as they perform > equally > > > well. > > > > > > Also, the systems that we have can host up to 8 SATA drives in them. In > > that > > > case, would backplanes offer additional advantages? > > > > > > Any suggestions on 5400 vs. 7200 vs. 10000 RPM disks? I guess 10K rpm > > disks > > > would be overkill comparing their perf/cost advantage? > > > > > > Thanks for your inputs. > > > > > > -Shrinivas > > > > > > On Thu, Feb 10, 2011 at 2:48 PM, Chris Collins < > > [email protected]>wrote: > > > > > > > Of late we have had serious issues with seagate drives in our hadoop > > > > cluster. These were purchased over several purchasing cycles and > > pretty > > > > sure it wasnt just a single "bad batch". Because of this we > switched > > to > > > > buying 2TB hitachi drives which seem to of been considerably more > > reliable. > > > > > > > > Best > > > > > > > > C > > > > On Feb 10, 2011, at 12:43 PM, Ted Dunning wrote: > > > > > > > > > Get bigger disks. Data only grows and having extra is always good. > > > > > > > > > > You can get 2TB drives for <$100 and 1TB for < $75. > > > > > > > > > > As far as transfer rates are concerned, any 3GB/s SATA drive is > going > > to > > > > be > > > > > about the same (ish). Seek times will vary a bit with rotation > > speed, > > > > but > > > > > with Hadoop, you will be doing long reads and writes. > > > > > > > > > > Your controller and backplane will have a MUCH bigger vote in > getting > > > > > acceptable performance. With only 4 or 5 drives, you don't have to > > worry > > > > > about super-duper backplane, but you can still kill performance > with > > a > > > > lousy > > > > > controller. > > > > > > > > > > On Thu, Feb 10, 2011 at 12:26 PM, Shrinivas Joshi < > > [email protected] > > > > >wrote: > > > > > > > > > >> What would be a good hard drive for a 7 node cluster which is > > targeted > > > > to > > > > >> run a mix of IO and CPU intensive Hadoop workloads? We are looking > > for > > > > >> around 1 TB of storage on each node distributed amongst 4 or 5 > > disks. So > > > > >> either 250GB * 4 disks or 160GB * 5 disks. Also it should be less > > than > > > > 100$ > > > > >> each ;) > > > > >> > > > > >> I looked at HDD benchmark comparisons on tomshardware, > storagereview > > > > etc. > > > > >> Got overwhelmed with the # of benchmarks and different aspects of > > HDD > > > > >> performance. > > > > >> > > > > >> Appreciate your help on this. > > > > >> > > > > >> -Shrinivas > > > > >> > > > > > > > > > > > > > > > > >
