My setup is an existing farm based on a central Netapp, looking to scale out and considering hadoop as a data processing / DWH alternative. Does this add any relevant details to the answer? Thanks.
On Tue, Dec 22, 2009 at 6:34 PM, Brian Bockelman <[email protected]>wrote: > Things to consider are cost, reliability, scalability, and what equipment > you might already own. > > - SAN / NAS: generally less reliable than HDFS in terms of "how much data > do you lose if lightning strikes a box?". Many SAN/NAS solutions start with > the assumption that a given piece of hardware will never fail; I have found > this to be a lousy assumption at our site. > - At today's disk failure rates, you can expect 2 dead disks a day for a > petabyte scale solution. Keep this in mind for your plans. A HDFS-based > solution will recover nicely from disk deaths. > - local DAS can be more scalable depending on your application. > - If you already own a SAN/NAS and it is sufficient for your install, don't > throw out the equipment. Use it. > - local DAS comes in cheaper *if* you need to buy the computational power > anyway. > > A lot of this comes down to what your operations staff is used to. > - If you have deep experience with a vendor-supported file system (i.e., > GPFS), I'd recommend continuing to use it. > - If you have no background in this area, you would probably benefit from > Hadoop support from a company like Cloudera. > > Hope this helps - you didn't give much background into your specific > situation, so I can only answer in very general terms. > > Brian > > On Dec 22, 2009, at 10:24 AM, Doopah Shaf wrote: > > > Does anyone have any recommendations for / against using a NAS / SAN > system > > as the underlying physical storage for a hadoop cluster, instead of local > > data node DAS? > >
