[Hadoop Wiki] Update of "NameNodeFailover" by KonstantinShvachko

Apache Wiki Mon, 11 Aug 2008 10:31:30 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by KonstantinShvachko:
http://wiki.apache.org/hadoop/NameNodeFailover

The comment on the change is:
Inconsistent

------------------------------------------------------------------------------
- The name node is a critical resource for the cluster because data nodes don't 
know enough about the blocks that they contain to coherently answer requests 
for anything but the block contents.  This isn't generally a serious problem 
because single machines are typically fairly reliable (it is only with a large 
cluster that we expect daily or hourly failures).
+ deleted
  
- That said, there is a secondary name node that talks to the primary name node 
on a regular basis in order to keep track of the files in the system.  It does 
this by copying the fsimage and editlog files from the primary name node.
- 
- If the name node dies, the simplest procedure is to simply use DNS to rename 
the primary and secondary name nodes.  The secondary name node will serve as 
primary name node as long as nodes request meta-data from it.  Once you get 
your old primary back up, you should reconfigure it to be the secondary name 
node and you will be back in full operation.
- 
- Note that the secondary name node only copies information every few minutes.  
For a more up-to-date recovery, you can make the name node log transactions to 
multiple directories, including one networked mounted one.  You can then copy 
the fsimage and fsedit files from that networked directory and have a recovery 
that is up to the second.
- 
- Questions I still have include:
- 
-  * what do you have to do to the old primary to make it be a secondary?
- 
-  * can you have more than one secondary name node (for off-site backup 
purposes)?
- 
-  * are there plans for distributing the name node function?  
- 
- === Answer ===
- Secondary Namenode does not have function to be a failover mechanism.  It is 
a helping process to the namenode.  It is not of help if the namenode fails.  
The name is possibly misleading.
- 
- In order to provide redundancy for data protection in case of namenode 
failure the best way is to store the namenode metadata on a different machine.  
Hadoop has an option to have multiple namenode directories and the recommended 
option is to have one of the namenode directories on an NFS share.  However you 
have to make sure the NFS locking will not cause problems and it is NOT 
recommended to change this on a live system because it can corrupt namenode 
data.  Another option is to simply copy namenode metadata to another machine.
- --Ankur Sethi
- 
- '''Question'''
- 
- Why not keep the fsimage and editlog in the DFS (somehow that they could be 
located by data nodes without the name node)?
- Then when then name node fails, by an election mechanism, a data node becomes 
the new name node. 
- --Cosmin Lehene
-  
-

[Hadoop Wiki] Update of "NameNodeFailover" by KonstantinShvachko

Reply via email to