I also have the idea in my mind. I want to install the hadoop cluster
in serveral blade servers, which are connected with a disk array. Data
is storaged in the disk array, and the blade servers are used for
computation.
Can MapReduce benefit from such architecture?
Could the communication between blade server and disk array be the bottleneck?

Thanks,
Weiming

On Wed, Dec 23, 2009 at 2:58 AM, Doopah Shaf <[email protected]> wrote:
> My setup is an existing farm based on a central Netapp, looking to scale out
> and considering hadoop as a data processing / DWH alternative. Does this add
> any relevant details to the answer?
> Thanks.
>
> On Tue, Dec 22, 2009 at 6:34 PM, Brian Bockelman <[email protected]>wrote:
>
>> Things to consider are cost, reliability, scalability, and what equipment
>> you might already own.
>>
>> - SAN / NAS: generally less reliable than HDFS in terms of "how much data
>> do you lose if lightning strikes a box?".  Many SAN/NAS solutions start with
>> the assumption that a given piece of hardware will never fail; I have found
>> this to be a lousy assumption at our site.
>>  - At today's disk failure rates, you can expect 2 dead disks a day for a
>> petabyte scale solution.  Keep this in mind for your plans.  A HDFS-based
>> solution will recover nicely from disk deaths.
>> - local DAS can be more scalable depending on your application.
>> - If you already own a SAN/NAS and it is sufficient for your install, don't
>> throw out the equipment.  Use it.
>> - local DAS comes in cheaper *if* you need to buy the computational power
>> anyway.
>>
>> A lot of this comes down to what your operations staff is used to.
>> - If you have deep experience with a vendor-supported file system (i.e.,
>> GPFS), I'd recommend continuing to use it.
>> - If you have no background in this area, you would probably benefit from
>> Hadoop support from a company like Cloudera.
>>
>> Hope this helps - you didn't give much background into your specific
>> situation, so I can only answer in very general terms.
>>
>> Brian
>>
>> On Dec 22, 2009, at 10:24 AM, Doopah Shaf wrote:
>>
>> > Does anyone have any recommendations for / against using a NAS / SAN
>> system
>> > as the underlying physical storage for a hadoop cluster, instead of local
>> > data node DAS?
>>
>>
>

Reply via email to