Thanks for your inputs, Michael. We have 6 open SATA ports on the motherboards. That is the reason why we are thinking of 4 to 5 data disks and 1 OS disk. Are you suggesting use of one 2TB disk instead of four 500GB disks lets say? I thought that the HDFS utilization/throughput increases with the # of disks per node (assuming that the total usable IO bandwidth increases proportionally).
-Shrinivas On Thu, Feb 10, 2011 at 4:25 PM, Michael Segel <[email protected]>wrote: > > Shrinivas, > > Assuming you're in the US, I'd recommend the following: > > Go with 2TB 7200 SATA hard drives. > (Not sure what type of hardware you have) > > What we've found is that in the data nodes, there's an optimal > configuration that balances price versus performance. > > While your chasis may hold 8 drives, how many open SATA ports are on the > motherboard? Since you're using JBOD, you don't want the additional expense > of having to purchase a separate controller card for the additional drives. > > I'm running Seagate drives at home and I haven't had any problems for > years. > When you look at your drive, you need to know total storage, speed (rpms), > and cache size. > Looking at Microcenter's pricing... 2TB 3.0GB SATA Hitachi was $110.00 A > 1TB Seagate was 70.00 > A 250GB SATA drive was $45.00 > > So 2TB = 110, 140, 180 (respectively) > > So you get a better deal on 2TB. > > So if you go out and get more drives but of lower density, you'll end up > spending more money and use more energy, but I doubt you'll see a real > performance difference. > > The other thing is that if you want to add more disk, you have room to > grow. (Just add more disk and restart the node, right?) > If all of your disk slots are filled, you're SOL. You have to take out the > box, replace all of the drives, then add to cluster as 'new' node. > > Just my $0.02 cents. > > HTH > > -Mike > > > Date: Thu, 10 Feb 2011 15:47:16 -0600 > > Subject: Re: recommendation on HDDs > > From: [email protected] > > To: [email protected] > > > > Hi Ted, Chris, > > > > Much appreciate your quick reply. The reason why we are looking for > smaller > > capacity drives is because we are not anticipating a huge growth in data > > footprint and also read somewhere that larger the capacity of the drive, > > bigger the number of platters in them and that could affect drive > > performance. But looks like you can get 1TB drives with only 2 platters. > > Large capacity drives should be OK for us as long as they perform equally > > well. > > > > Also, the systems that we have can host up to 8 SATA drives in them. In > that > > case, would backplanes offer additional advantages? > > > > Any suggestions on 5400 vs. 7200 vs. 10000 RPM disks? I guess 10K rpm > disks > > would be overkill comparing their perf/cost advantage? > > > > Thanks for your inputs. > > > > -Shrinivas > > > > On Thu, Feb 10, 2011 at 2:48 PM, Chris Collins < > [email protected]>wrote: > > > > > Of late we have had serious issues with seagate drives in our hadoop > > > cluster. These were purchased over several purchasing cycles and > pretty > > > sure it wasnt just a single "bad batch". Because of this we switched > to > > > buying 2TB hitachi drives which seem to of been considerably more > reliable. > > > > > > Best > > > > > > C > > > On Feb 10, 2011, at 12:43 PM, Ted Dunning wrote: > > > > > > > Get bigger disks. Data only grows and having extra is always good. > > > > > > > > You can get 2TB drives for <$100 and 1TB for < $75. > > > > > > > > As far as transfer rates are concerned, any 3GB/s SATA drive is going > to > > > be > > > > about the same (ish). Seek times will vary a bit with rotation > speed, > > > but > > > > with Hadoop, you will be doing long reads and writes. > > > > > > > > Your controller and backplane will have a MUCH bigger vote in getting > > > > acceptable performance. With only 4 or 5 drives, you don't have to > worry > > > > about super-duper backplane, but you can still kill performance with > a > > > lousy > > > > controller. > > > > > > > > On Thu, Feb 10, 2011 at 12:26 PM, Shrinivas Joshi < > [email protected] > > > >wrote: > > > > > > > >> What would be a good hard drive for a 7 node cluster which is > targeted > > > to > > > >> run a mix of IO and CPU intensive Hadoop workloads? We are looking > for > > > >> around 1 TB of storage on each node distributed amongst 4 or 5 > disks. So > > > >> either 250GB * 4 disks or 160GB * 5 disks. Also it should be less > than > > > 100$ > > > >> each ;) > > > >> > > > >> I looked at HDD benchmark comparisons on tomshardware, storagereview > > > etc. > > > >> Got overwhelmed with the # of benchmarks and different aspects of > HDD > > > >> performance. > > > >> > > > >> Appreciate your help on this. > > > >> > > > >> -Shrinivas > > > >> > > > > > > > > > > >
