My setup is an existing farm based on a central Netapp, looking to scale out
and considering hadoop as a data processing / DWH alternative. Does this add
any relevant details to the answer?
Thanks.

On Tue, Dec 22, 2009 at 6:34 PM, Brian Bockelman <[email protected]>wrote:

> Things to consider are cost, reliability, scalability, and what equipment
> you might already own.
>
> - SAN / NAS: generally less reliable than HDFS in terms of "how much data
> do you lose if lightning strikes a box?".  Many SAN/NAS solutions start with
> the assumption that a given piece of hardware will never fail; I have found
> this to be a lousy assumption at our site.
>  - At today's disk failure rates, you can expect 2 dead disks a day for a
> petabyte scale solution.  Keep this in mind for your plans.  A HDFS-based
> solution will recover nicely from disk deaths.
> - local DAS can be more scalable depending on your application.
> - If you already own a SAN/NAS and it is sufficient for your install, don't
> throw out the equipment.  Use it.
> - local DAS comes in cheaper *if* you need to buy the computational power
> anyway.
>
> A lot of this comes down to what your operations staff is used to.
> - If you have deep experience with a vendor-supported file system (i.e.,
> GPFS), I'd recommend continuing to use it.
> - If you have no background in this area, you would probably benefit from
> Hadoop support from a company like Cloudera.
>
> Hope this helps - you didn't give much background into your specific
> situation, so I can only answer in very general terms.
>
> Brian
>
> On Dec 22, 2009, at 10:24 AM, Doopah Shaf wrote:
>
> > Does anyone have any recommendations for / against using a NAS / SAN
> system
> > as the underlying physical storage for a hadoop cluster, instead of local
> > data node DAS?
>
>

Reply via email to