Re: hadoop hardware configuration

Patrick Angeles Thu, 28 May 2009 12:01:23 -0700

On Thu, May 28, 2009 at 10:24 AM, Brian Bockelman <bbock...@cse.unl.edu>wrote:


>
> We do both -- push the disk image out to NFS and have a mirrored SAS hard
> drives on the namenode.  The SAS drives appear to be overkill.
>

This sounds like a nice approach, taking into account hardware, labor and
downtime costs... $700 for a RAID controller seems reasonable to minimize
maintenance due to a disk failure. Alex's suggestion to go JBOD and write to
all volumes would work as well, but slightly more labor intensive.


>>   2. What is a good processor-to-storage ratio for a task node with 4TB of
>>>  raw storage? (The config above has 1 core per 1TB of raw storage.)
>>>
>>
>
> We're data hungry locally -- I'd put in bigger hard drives.  The 1.5TB
> Seagate drives seem to have passed their teething issues, and are at a
> pretty sweet price point.  They only will scale up to 60 IOPS, so make sure
> your workflows don't have lots of random I/O.
>

I haven't seen too many vendors offering the 1.5TB option. What type of data
are you working with? At what volumes? I sense that at 50GB/day, we are
higher than average in terms of data volume over time.


> As Steve mentions below, the rest is really up to your algorithm.  Do you
> need 1 CPU second / byte?  If so, buy more CPUs.  Do you need .1 CPU second
> / MB?  If so, buy more disks.
>

Unfortunately, we won't know until we have a cluster to test on. Classic
catch-22. We are going to experiment with a small cluster and a small data
set, with plans to buy more appropriately sized slave nodes based on what we
learn.

- P

Re: hadoop hardware configuration

Reply via email to