On Fri, Feb 11, 2011 at 7:14 PM, Ted Dunning <[email protected]> wrote:
> Bandwidth is definitely better with more active spindles.  I would recommend
> several larger disks.  The cost is very nearly the same.
>
> On Fri, Feb 11, 2011 at 3:52 PM, Shrinivas Joshi <[email protected]>wrote:
>
>> Thanks for your inputs, Michael.  We have 6 open SATA ports on the
>> motherboards. That is the reason why we are thinking of 4 to 5 data disks
>> and 1 OS disk.
>> Are you suggesting use of one 2TB disk instead of four 500GB disks lets
>> say?
>> I thought that the HDFS utilization/throughput increases with the # of
>> disks
>> per node (assuming that the total usable IO bandwidth increases
>> proportionally).
>>
>> -Shrinivas
>>
>> On Thu, Feb 10, 2011 at 4:25 PM, Michael Segel <[email protected]
>> >wrote:
>>
>> >
>> > Shrinivas,
>> >
>> > Assuming you're in the US, I'd recommend the following:
>> >
>> > Go with 2TB 7200 SATA hard drives.
>> > (Not sure what type of hardware you have)
>> >
>> > What  we've found is that in the data nodes, there's an optimal
>> > configuration that balances price versus performance.
>> >
>> > While your chasis may hold 8 drives, how many open SATA ports are on the
>> > motherboard? Since you're using JBOD, you don't want the additional
>> expense
>> > of having to purchase a separate controller card for the additional
>> drives.
>> >
>> > I'm running Seagate drives at home and I haven't had any problems for
>> > years.
>> > When you look at your drive, you need to know total storage, speed
>> (rpms),
>> > and cache size.
>> > Looking at Microcenter's pricing... 2TB 3.0GB SATA Hitachi was $110.00 A
>> > 1TB Seagate was 70.00
>> > A 250GB SATA drive was $45.00
>> >
>> > So 2TB = 110, 140, 180 (respectively)
>> >
>> > So you get a better deal on 2TB.
>> >
>> > So if you go out and get more drives but of lower density, you'll end up
>> > spending more money and use more energy, but I doubt you'll see a real
>> > performance difference.
>> >
>> > The other thing is that if you want to add more disk, you have room to
>> > grow. (Just add more disk and restart the node, right?)
>> > If all of your disk slots are filled, you're SOL. You have to take out
>> the
>> > box, replace all of the drives, then add to cluster as 'new' node.
>> >
>> > Just my $0.02 cents.
>> >
>> > HTH
>> >
>> > -Mike
>> >
>> > > Date: Thu, 10 Feb 2011 15:47:16 -0600
>> > > Subject: Re: recommendation on HDDs
>> > > From: [email protected]
>> > > To: [email protected]
>> > >
>> > > Hi Ted, Chris,
>> > >
>> > > Much appreciate your quick reply. The reason why we are looking for
>> > smaller
>> > > capacity drives is because we are not anticipating a huge growth in
>> data
>> > > footprint and also read somewhere that larger the capacity of the
>> drive,
>> > > bigger the number of platters in them and that could affect drive
>> > > performance. But looks like you can get 1TB drives with only 2
>> platters.
>> > > Large capacity drives should be OK for us as long as they perform
>> equally
>> > > well.
>> > >
>> > > Also, the systems that we have can host up to 8 SATA drives in them. In
>> > that
>> > > case, would  backplanes offer additional advantages?
>> > >
>> > > Any suggestions on 5400 vs. 7200 vs. 10000 RPM disks?  I guess 10K rpm
>> > disks
>> > > would be overkill comparing their perf/cost advantage?
>> > >
>> > > Thanks for your inputs.
>> > >
>> > > -Shrinivas
>> > >
>> > > On Thu, Feb 10, 2011 at 2:48 PM, Chris Collins <
>> > [email protected]>wrote:
>> > >
>> > > > Of late we have had serious issues with seagate drives in our hadoop
>> > > > cluster.  These were purchased over several purchasing cycles and
>> > pretty
>> > > > sure it wasnt just a single "bad batch".   Because of this we
>> switched
>> > to
>> > > > buying 2TB hitachi drives which seem to of been considerably more
>> > reliable.
>> > > >
>> > > > Best
>> > > >
>> > > > C
>> > > > On Feb 10, 2011, at 12:43 PM, Ted Dunning wrote:
>> > > >
>> > > > > Get bigger disks.  Data only grows and having extra is always good.
>> > > > >
>> > > > > You can get 2TB drives for <$100 and 1TB for < $75.
>> > > > >
>> > > > > As far as transfer rates are concerned, any 3GB/s SATA drive is
>> going
>> > to
>> > > > be
>> > > > > about the same (ish).  Seek times will vary a bit with rotation
>> > speed,
>> > > > but
>> > > > > with Hadoop, you will be doing long reads and writes.
>> > > > >
>> > > > > Your controller and backplane will have a MUCH bigger vote in
>> getting
>> > > > > acceptable performance.  With only 4 or 5 drives, you don't have to
>> > worry
>> > > > > about super-duper backplane, but you can still kill performance
>> with
>> > a
>> > > > lousy
>> > > > > controller.
>> > > > >
>> > > > > On Thu, Feb 10, 2011 at 12:26 PM, Shrinivas Joshi <
>> > [email protected]
>> > > > >wrote:
>> > > > >
>> > > > >> What would be a good hard drive for a 7 node cluster which is
>> > targeted
>> > > > to
>> > > > >> run a mix of IO and CPU intensive Hadoop workloads? We are looking
>> > for
>> > > > >> around 1 TB of storage on each node distributed amongst 4 or 5
>> > disks. So
>> > > > >> either 250GB * 4 disks or 160GB * 5 disks. Also it should be less
>> > than
>> > > > 100$
>> > > > >> each ;)
>> > > > >>
>> > > > >> I looked at HDD benchmark comparisons on tomshardware,
>> storagereview
>> > > > etc.
>> > > > >> Got overwhelmed with the # of benchmarks and different aspects of
>> > HDD
>> > > > >> performance.
>> > > > >>
>> > > > >> Appreciate your help on this.
>> > > > >>
>> > > > >> -Shrinivas
>> > > > >>
>> > > >
>> > > >
>> > > >
>> >
>> >
>>
>

You also do not need a dedicated OS disk. I typically slice to
partitions of some of the disks and do a software mirror there. this
gives you redundancy without having to sacrifice one or two disk slots
with smaller disks.

Reply via email to