On Wed, Jul 29, 2009 at 8:51 AM, bhushan_mahale < [email protected]> wrote:
> Hi, > > What are the possible ways to retrieve the data if a node goes down in a > Hadoop cluster? > > Assuming replication factor as 3, and 3 nodes goes down in a 10 node > cluster, how do we retrieve the data? > Hi Bhushan, If 3 nodes go down at the same time, some of your data will become inaccessible. If you cannot recover at least one of those nodes, you will have no way to recover the data. If you can recover at least one, then the blocks will become available at replication count 1. The NN will notice the underreplicated blocks and trigger rereplication to get them back up to 3. If your nodes fail one-by-one with some time in between, the NN should have time to trigger rereplication between them and the blocks will never be inaccessible. In general, simultaneous failures occur in two ways in the datacenter: one is that the entire datacenter has lost power (or forced shutdown due to lost cooling). In this case, no amount of replication within the DC will help. The other failure is that power (or network) is lost to an entire rack, either due to a switch failure or a failed PDU. If you've configured Hadoop's rack-awareness, it will ensure that each block is replicated on at least two racks to mitigate the downside of a rack loss. Depending on your particular setup, it may be worth putting your 10-node cluster spread across separate power circuits and configuring them as separate "racks" in Hadoop, if you're concerned about flaky rack PDUs. Hope that helps -Todd > Thanks, > - Bhushan > > > DISCLAIMER > ========== > This e-mail may contain privileged and confidential information which is > the property of Persistent Systems Ltd. It is intended only for the use of > the individual or entity to which it is addressed. If you are not the > intended recipient, you are not authorized to read, retain, copy, print, > distribute or use this message. If you have received this communication in > error, please notify the sender and delete all copies of this message. > Persistent Systems Ltd. does not accept any liability for virus infected > mails. >
