Things to consider are cost, reliability, scalability, and what equipment you 
might already own.

- SAN / NAS: generally less reliable than HDFS in terms of "how much data do 
you lose if lightning strikes a box?".  Many SAN/NAS solutions start with the 
assumption that a given piece of hardware will never fail; I have found this to 
be a lousy assumption at our site.
  - At today's disk failure rates, you can expect 2 dead disks a day for a 
petabyte scale solution.  Keep this in mind for your plans.  A HDFS-based 
solution will recover nicely from disk deaths.
- local DAS can be more scalable depending on your application.
- If you already own a SAN/NAS and it is sufficient for your install, don't 
throw out the equipment.  Use it.
- local DAS comes in cheaper *if* you need to buy the computational power 
anyway.

A lot of this comes down to what your operations staff is used to.
- If you have deep experience with a vendor-supported file system (i.e., GPFS), 
I'd recommend continuing to use it.
- If you have no background in this area, you would probably benefit from 
Hadoop support from a company like Cloudera.

Hope this helps - you didn't give much background into your specific situation, 
so I can only answer in very general terms.

Brian

On Dec 22, 2009, at 10:24 AM, Doopah Shaf wrote:

> Does anyone have any recommendations for / against using a NAS / SAN system
> as the underlying physical storage for a hadoop cluster, instead of local
> data node DAS?

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to