Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by KonstantinShvachko: http://wiki.apache.org/hadoop/NameNodeFailover The comment on the change is: Inconsistent ------------------------------------------------------------------------------ - The name node is a critical resource for the cluster because data nodes don't know enough about the blocks that they contain to coherently answer requests for anything but the block contents. This isn't generally a serious problem because single machines are typically fairly reliable (it is only with a large cluster that we expect daily or hourly failures). + deleted - That said, there is a secondary name node that talks to the primary name node on a regular basis in order to keep track of the files in the system. It does this by copying the fsimage and editlog files from the primary name node. - - If the name node dies, the simplest procedure is to simply use DNS to rename the primary and secondary name nodes. The secondary name node will serve as primary name node as long as nodes request meta-data from it. Once you get your old primary back up, you should reconfigure it to be the secondary name node and you will be back in full operation. - - Note that the secondary name node only copies information every few minutes. For a more up-to-date recovery, you can make the name node log transactions to multiple directories, including one networked mounted one. You can then copy the fsimage and fsedit files from that networked directory and have a recovery that is up to the second. - - Questions I still have include: - - * what do you have to do to the old primary to make it be a secondary? - - * can you have more than one secondary name node (for off-site backup purposes)? - - * are there plans for distributing the name node function? - - === Answer === - Secondary Namenode does not have function to be a failover mechanism. It is a helping process to the namenode. It is not of help if the namenode fails. The name is possibly misleading. - - In order to provide redundancy for data protection in case of namenode failure the best way is to store the namenode metadata on a different machine. Hadoop has an option to have multiple namenode directories and the recommended option is to have one of the namenode directories on an NFS share. However you have to make sure the NFS locking will not cause problems and it is NOT recommended to change this on a live system because it can corrupt namenode data. Another option is to simply copy namenode metadata to another machine. - --Ankur Sethi - - '''Question''' - - Why not keep the fsimage and editlog in the DFS (somehow that they could be located by data nodes without the name node)? - Then when then name node fails, by an election mechanism, a data node becomes the new name node. - --Cosmin Lehene - -
