I am about to attempt setting up a hadoop file system for an application. Hadoop Filesystem has single point of failure, namenode. Can you explain steps necessary for bringing the HDFS backup in case of namenode failure?
Before asking this question I went through these pages: http://wiki.apache.org/lucene-hadoop-data/attachments/HadoopPresentations/att achments/HDFSDescription.pdf and http://lucene.apache.org/hadoop/hdfs_design.html These describe the overall architecture and the fact that one can have secondary namenodes. Lets say the machine just died. >From the documentation: "The Namenode machine is a single point of failure for an HDFS cluster. If the Namenode machine fails, manual intervention is necessary. Currently, automatic restart and failover of the Namenode software to another machine is not supported." So what is this manual intervention? I am confused on this. All the nodes have a configuration file with the master namenode set. So one should bringup a machine with the same name/ip address. Then what? Can one bring up the new machine and start a namenode server and have it repopulate on its own? Please explain? Sorry if this has been asked before. I did research on the mailing list and the FAQ page and the documentation before asking this. Thanks, Ankur
