Things to consider are cost, reliability, scalability, and what equipment you might already own.
- SAN / NAS: generally less reliable than HDFS in terms of "how much data do you lose if lightning strikes a box?". Many SAN/NAS solutions start with the assumption that a given piece of hardware will never fail; I have found this to be a lousy assumption at our site. - At today's disk failure rates, you can expect 2 dead disks a day for a petabyte scale solution. Keep this in mind for your plans. A HDFS-based solution will recover nicely from disk deaths. - local DAS can be more scalable depending on your application. - If you already own a SAN/NAS and it is sufficient for your install, don't throw out the equipment. Use it. - local DAS comes in cheaper *if* you need to buy the computational power anyway. A lot of this comes down to what your operations staff is used to. - If you have deep experience with a vendor-supported file system (i.e., GPFS), I'd recommend continuing to use it. - If you have no background in this area, you would probably benefit from Hadoop support from a company like Cloudera. Hope this helps - you didn't give much background into your specific situation, so I can only answer in very general terms. Brian On Dec 22, 2009, at 10:24 AM, Doopah Shaf wrote: > Does anyone have any recommendations for / against using a NAS / SAN system > as the underlying physical storage for a hadoop cluster, instead of local > data node DAS?
smime.p7s
Description: S/MIME cryptographic signature
