Thanks Todd for the reply.

A few more queries:
If the dead node comes up (with different IP and RSA key than before), can we 
use the data it has?
If we just make the password-less login to work; would the data on that node be 
useful without the need for formatting the namenode?

Thanks,
- Bhushan
-----Original Message-----
From: Todd Lipcon [mailto:[email protected]]
Sent: Wednesday, July 29, 2009 11:58 PM
To: [email protected]
Subject: Re: To retrieve data on dead node

On Wed, Jul 29, 2009 at 8:51 AM, bhushan_mahale <
[email protected]> wrote:

> Hi,
>
> What are the possible ways to retrieve the data if a node goes down in a
> Hadoop cluster?
>
> Assuming replication factor as 3, and 3 nodes goes down in a 10 node
> cluster, how do we retrieve the data?
>

Hi Bhushan,

If 3 nodes go down at the same time, some of your data will become
inaccessible. If you cannot recover at least one of those nodes, you will
have no way to recover the data. If you can recover at least one, then the
blocks will become available at replication count 1. The NN will notice the
underreplicated blocks and trigger rereplication to get them back up to 3.

If your nodes fail one-by-one with some time in between, the NN should have
time to trigger rereplication between them and the blocks will never be
inaccessible.

In general, simultaneous failures occur in two ways in the datacenter: one
is that the entire datacenter has lost power (or forced shutdown due to lost
cooling). In this case, no amount of replication within the DC will help.
The other failure is that power (or network) is lost to an entire rack,
either due to a switch failure or a failed PDU. If you've configured
Hadoop's rack-awareness, it will ensure that each block is replicated on at
least two racks to mitigate the downside of a rack loss.

Depending on your particular setup, it may be worth putting your 10-node
cluster spread across separate power circuits and configuring them as
separate "racks" in Hadoop, if you're concerned about flaky rack PDUs.

Hope that helps
-Todd


> Thanks,
> - Bhushan
>
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.
>

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.

Reply via email to