On Thu, May 28, 2009 at 10:24 AM, Brian Bockelman <bbock...@cse.unl.edu>wrote:
> > We do both -- push the disk image out to NFS and have a mirrored SAS hard > drives on the namenode. The SAS drives appear to be overkill. > This sounds like a nice approach, taking into account hardware, labor and downtime costs... $700 for a RAID controller seems reasonable to minimize maintenance due to a disk failure. Alex's suggestion to go JBOD and write to all volumes would work as well, but slightly more labor intensive. >> 2. What is a good processor-to-storage ratio for a task node with 4TB of >>> raw storage? (The config above has 1 core per 1TB of raw storage.) >>> >> > > We're data hungry locally -- I'd put in bigger hard drives. The 1.5TB > Seagate drives seem to have passed their teething issues, and are at a > pretty sweet price point. They only will scale up to 60 IOPS, so make sure > your workflows don't have lots of random I/O. > I haven't seen too many vendors offering the 1.5TB option. What type of data are you working with? At what volumes? I sense that at 50GB/day, we are higher than average in terms of data volume over time. > As Steve mentions below, the rest is really up to your algorithm. Do you > need 1 CPU second / byte? If so, buy more CPUs. Do you need .1 CPU second > / MB? If so, buy more disks. > Unfortunately, we won't know until we have a cluster to test on. Classic catch-22. We are going to experiment with a small cluster and a small data set, with plans to buy more appropriately sized slave nodes based on what we learn. - P