As far as I know, setup a backup namenode dir is enough. I haven't use the hadoop in a production environment. So, I can't tell you what would be right way to reboot the server.
On Thu, Dec 23, 2010 at 6:50 PM, Bjoern Schiessle <[email protected]>wrote: > Hi, > > On Thu, 23 Dec 2010 09:30:17 +0800 li ping wrote: > > It seems the exception occurs during NameNode loads the editlog. > > make sure the editlog file exists. or you can debug the application to > > see what's wrong. > > last night I tried to fix the problem and did a big mistake. Instead of > copying /var/lib/hadoop-0.20/cache/hadoop/dfs/name/current/edits and > edits.new to a backup I moved them and later delete the only version > hence I thought I have a copy. > > The good thing: The namenode starts again. > The bad thing: My file system is now in an inconsistent state. > > Probably the only solution is to reformat the hdfs and start from > scratch. Thankfully there wasn't that much data stored at the hdfs until > now but I definitely have to make sure that this doesn't happens again: > > 1. I have set up a second dfs.name.dir which is stored at another > computer (mounted by sshfs) > 2. I will install a backup script similar to: > http://blog.milford.io/2010/10/simple-hadoop-namenode-backup-script > > Do you think this should be enough to overcome such situations in the > future? Any additional ideas how to make it more safe? > > I'm still a little bit afraid if I think about the next time I will have > to reboot the server. Shouldn't a reboot safely stop and restart all > Hadoop services? Is there any thing I can do to make sure that the next > reboot will not cause the same problems? > > Thanks a lot! > Björn > > > -- -----李平
