Hi Peter, The edits.new file is used when the edits and fsimage is pulled to the secondarynamenode. Here's the process:
1) SNN pulls edits and fsimage 2) NN starts writing edits to edits.new 3) SNN sends new fsimage to NN 4) NN replaces its fsimage with the SNN fsimage 5) NN replaces edits with edits.new Certainly taking a different fsimage and trying to apply edits to it won't work. Your best bet might be to take the 3-day-old fsimage with an empty edits and delete edits.new. But before you do any of this, make sure you completely backup all values for dfs.name.dir and dfs.checkpoint.dir. What are the timestamps on the fsimage files in each dfs.name.dir and dfs.checkpoint.dir? Do the namenode and secondarynamenode have enough disk space? Have you consulted the logs to learn why the SNN/NN didn't properly update the fsimage and edits log? Hope this helps. Alex On Wed, Jul 7, 2010 at 7:34 AM, Peter Falk <[email protected]> wrote: > Just a little update. We found a working fsimage that was just a couple of > days older than the corrupt one. We tried to replace the fsimage with the > working one, and kept the edits and edits.new files, hoping the the latest > edits would be still in use. However, when starting the namenode, the > following error message appears. Any thought ideas or hints of how to > continue? Edit the edits files somehow? > > TIA, > Peter > > 2010-07-07 16:21:10,312 INFO org.apache.hadoop.hdfs.server.common.Storage: > Number of files = 28372 > 2010-07-07 16:21:11,162 INFO org.apache.hadoop.hdfs.server.common.Storage: > Number of files under construction = 8 > 2010-07-07 16:21:11,164 INFO org.apache.hadoop.hdfs.server.common.Storage: > Image file of size 3315887 loaded in 0 seconds. > 2010-07-07 16:21:11,164 DEBUG > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 9: > /hbase/.logs/miller,60020,1274447474064/hlog.dat.1274706452423 numblocks : > 1 > clientHolder clientMachine > 2010-07-07 16:21:11,164 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* > FSDirectory.unprotectedDelete: failed to remove > /hbase/.logs/miller,60020,1274447474064/hlog.dat.1274706452423 because it > does not exist > 2010-07-07 16:21:11,164 ERROR > org.apache.hadoop.hdfs.server.namenode.NameNode: > java.lang.NullPointerException > at > > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1006) > at > > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:982) > at > > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:194) > at > > org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:615) > at > > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992) > at > > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812) > at > > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364) > at > > org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) > at > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311) > at > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292) > at > > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279) > at > > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) > > 2010-07-07 16:21:11,165 INFO > org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: > /************************************************************ > SHUTDOWN_MSG: Shutting down NameNode at fanta/192.168.10.53 > ************************************************************/ > > > On Wed, Jul 7, 2010 at 14:46, Peter Falk <[email protected]> wrote: > > > Hi, > > > > After a restart of our live cluster today, the name node fails to start > > with the log message seen below. There is a big file called edits.new in > the > > "current" folder that seems be the only one that have received changes > > recently (no changes to the edits or the fsimage for over a month). Is > that > > normal? > > > > The last change to the edits.new file was right before shutting down the > > cluster. It seems like the shutdown was unable to store valid fsimage, > > edits, edits.new files. The secondary name node image does not include > the > > edits.new file, only edits and fsimage, which are identical to the name > > nodes version. So no help from them. > > > > Would appreciate any help in understanding what could have gone wrong. > The > > shutdown seemed to complete just fine, without any error message. Is > there > > any way to recreate the image from the data, or any other way to save our > > production data? > > > > Sincerely, > > Peter > > > > 2010-07-07 14:30:26,949 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: > > Initializing RPC Metrics with hostName=NameNode, port=9000 > > 2010-07-07 14:30:26,960 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: > > Initializing JVM Metrics with processName=NameNode, sessionId=null > > 2010-07-07 14:30:27,019 DEBUG > > org.apache.hadoop.security.UserGroupInformation: Unix Login: hbase,hbase > > 2010-07-07 14:30:27,149 ERROR > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem > > initialization failed. > > java.io.EOFException > > at java.io.DataInputStream.readShort(DataInputStream.java:298) > > at > > > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881) > > at > > > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807) > > at > > > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364) > > at > > > org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) > > at > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311) > > at > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292) > > at > > > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) > > at > > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279) > > at > > > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) > > at > > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) > > 2010-07-07 14:30:27,150 INFO org.apache.hadoop.ipc.Server: Stopping > server > > on 9000 > > 2010-07-07 14:30:27,151 ERROR > > org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException > > at java.io.DataInputStream.readShort(DataInputStream.java:298) > > at > > > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881) > > at > > > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807) > > at > > > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364) > > at > > > org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) > > at > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311) > > at > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292) > > at > > > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) > > at > > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279) > > at > > > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) > > at > > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965 > > >
