On Fri, Feb 11, 2011 at 7:14 PM, Ted Dunning <[email protected]> wrote: > Bandwidth is definitely better with more active spindles. I would recommend > several larger disks. The cost is very nearly the same. > > On Fri, Feb 11, 2011 at 3:52 PM, Shrinivas Joshi <[email protected]>wrote: > >> Thanks for your inputs, Michael. We have 6 open SATA ports on the >> motherboards. That is the reason why we are thinking of 4 to 5 data disks >> and 1 OS disk. >> Are you suggesting use of one 2TB disk instead of four 500GB disks lets >> say? >> I thought that the HDFS utilization/throughput increases with the # of >> disks >> per node (assuming that the total usable IO bandwidth increases >> proportionally). >> >> -Shrinivas >> >> On Thu, Feb 10, 2011 at 4:25 PM, Michael Segel <[email protected] >> >wrote: >> >> > >> > Shrinivas, >> > >> > Assuming you're in the US, I'd recommend the following: >> > >> > Go with 2TB 7200 SATA hard drives. >> > (Not sure what type of hardware you have) >> > >> > What we've found is that in the data nodes, there's an optimal >> > configuration that balances price versus performance. >> > >> > While your chasis may hold 8 drives, how many open SATA ports are on the >> > motherboard? Since you're using JBOD, you don't want the additional >> expense >> > of having to purchase a separate controller card for the additional >> drives. >> > >> > I'm running Seagate drives at home and I haven't had any problems for >> > years. >> > When you look at your drive, you need to know total storage, speed >> (rpms), >> > and cache size. >> > Looking at Microcenter's pricing... 2TB 3.0GB SATA Hitachi was $110.00 A >> > 1TB Seagate was 70.00 >> > A 250GB SATA drive was $45.00 >> > >> > So 2TB = 110, 140, 180 (respectively) >> > >> > So you get a better deal on 2TB. >> > >> > So if you go out and get more drives but of lower density, you'll end up >> > spending more money and use more energy, but I doubt you'll see a real >> > performance difference. >> > >> > The other thing is that if you want to add more disk, you have room to >> > grow. (Just add more disk and restart the node, right?) >> > If all of your disk slots are filled, you're SOL. You have to take out >> the >> > box, replace all of the drives, then add to cluster as 'new' node. >> > >> > Just my $0.02 cents. >> > >> > HTH >> > >> > -Mike >> > >> > > Date: Thu, 10 Feb 2011 15:47:16 -0600 >> > > Subject: Re: recommendation on HDDs >> > > From: [email protected] >> > > To: [email protected] >> > > >> > > Hi Ted, Chris, >> > > >> > > Much appreciate your quick reply. The reason why we are looking for >> > smaller >> > > capacity drives is because we are not anticipating a huge growth in >> data >> > > footprint and also read somewhere that larger the capacity of the >> drive, >> > > bigger the number of platters in them and that could affect drive >> > > performance. But looks like you can get 1TB drives with only 2 >> platters. >> > > Large capacity drives should be OK for us as long as they perform >> equally >> > > well. >> > > >> > > Also, the systems that we have can host up to 8 SATA drives in them. In >> > that >> > > case, would backplanes offer additional advantages? >> > > >> > > Any suggestions on 5400 vs. 7200 vs. 10000 RPM disks? I guess 10K rpm >> > disks >> > > would be overkill comparing their perf/cost advantage? >> > > >> > > Thanks for your inputs. >> > > >> > > -Shrinivas >> > > >> > > On Thu, Feb 10, 2011 at 2:48 PM, Chris Collins < >> > [email protected]>wrote: >> > > >> > > > Of late we have had serious issues with seagate drives in our hadoop >> > > > cluster. These were purchased over several purchasing cycles and >> > pretty >> > > > sure it wasnt just a single "bad batch". Because of this we >> switched >> > to >> > > > buying 2TB hitachi drives which seem to of been considerably more >> > reliable. >> > > > >> > > > Best >> > > > >> > > > C >> > > > On Feb 10, 2011, at 12:43 PM, Ted Dunning wrote: >> > > > >> > > > > Get bigger disks. Data only grows and having extra is always good. >> > > > > >> > > > > You can get 2TB drives for <$100 and 1TB for < $75. >> > > > > >> > > > > As far as transfer rates are concerned, any 3GB/s SATA drive is >> going >> > to >> > > > be >> > > > > about the same (ish). Seek times will vary a bit with rotation >> > speed, >> > > > but >> > > > > with Hadoop, you will be doing long reads and writes. >> > > > > >> > > > > Your controller and backplane will have a MUCH bigger vote in >> getting >> > > > > acceptable performance. With only 4 or 5 drives, you don't have to >> > worry >> > > > > about super-duper backplane, but you can still kill performance >> with >> > a >> > > > lousy >> > > > > controller. >> > > > > >> > > > > On Thu, Feb 10, 2011 at 12:26 PM, Shrinivas Joshi < >> > [email protected] >> > > > >wrote: >> > > > > >> > > > >> What would be a good hard drive for a 7 node cluster which is >> > targeted >> > > > to >> > > > >> run a mix of IO and CPU intensive Hadoop workloads? We are looking >> > for >> > > > >> around 1 TB of storage on each node distributed amongst 4 or 5 >> > disks. So >> > > > >> either 250GB * 4 disks or 160GB * 5 disks. Also it should be less >> > than >> > > > 100$ >> > > > >> each ;) >> > > > >> >> > > > >> I looked at HDD benchmark comparisons on tomshardware, >> storagereview >> > > > etc. >> > > > >> Got overwhelmed with the # of benchmarks and different aspects of >> > HDD >> > > > >> performance. >> > > > >> >> > > > >> Appreciate your help on this. >> > > > >> >> > > > >> -Shrinivas >> > > > >> >> > > > >> > > > >> > > > >> > >> > >> >
You also do not need a dedicated OS disk. I typically slice to partitions of some of the disks and do a software mirror there. this gives you redundancy without having to sacrifice one or two disk slots with smaller disks.
