What Alex said, and also it really looks like https://issues.apache.org/jira/browse/HDFS-1024 from having the experience of that issue.
J-D On Wed, Jul 7, 2010 at 8:07 AM, Alex Loddengaard <[email protected]> wrote: > Hi Peter, > > The edits.new file is used when the edits and fsimage is pulled to the > secondarynamenode. Here's the process: > > 1) SNN pulls edits and fsimage > 2) NN starts writing edits to edits.new > 3) SNN sends new fsimage to NN > 4) NN replaces its fsimage with the SNN fsimage > 5) NN replaces edits with edits.new > > Certainly taking a different fsimage and trying to apply edits to it won't > work. Your best bet might be to take the 3-day-old fsimage with an empty > edits and delete edits.new. But before you do any of this, make sure you > completely backup all values for dfs.name.dir and dfs.checkpoint.dir. What > are the timestamps on the fsimage files in each dfs.name.dir and > dfs.checkpoint.dir? > > Do the namenode and secondarynamenode have enough disk space? Have you > consulted the logs to learn why the SNN/NN didn't properly update the > fsimage and edits log? > > Hope this helps. > > Alex > > On Wed, Jul 7, 2010 at 7:34 AM, Peter Falk <[email protected]> wrote: > > > Just a little update. We found a working fsimage that was just a couple > of > > days older than the corrupt one. We tried to replace the fsimage with the > > working one, and kept the edits and edits.new files, hoping the the > latest > > edits would be still in use. However, when starting the namenode, the > > following error message appears. Any thought ideas or hints of how to > > continue? Edit the edits files somehow? > > > > TIA, > > Peter > > > > 2010-07-07 16:21:10,312 INFO > org.apache.hadoop.hdfs.server.common.Storage: > > Number of files = 28372 > > 2010-07-07 16:21:11,162 INFO > org.apache.hadoop.hdfs.server.common.Storage: > > Number of files under construction = 8 > > 2010-07-07 16:21:11,164 INFO > org.apache.hadoop.hdfs.server.common.Storage: > > Image file of size 3315887 loaded in 0 seconds. > > 2010-07-07 16:21:11,164 DEBUG > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 9: > > /hbase/.logs/miller,60020,1274447474064/hlog.dat.1274706452423 numblocks > : > > 1 > > clientHolder clientMachine > > 2010-07-07 16:21:11,164 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* > > FSDirectory.unprotectedDelete: failed to remove > > /hbase/.logs/miller,60020,1274447474064/hlog.dat.1274706452423 because it > > does not exist > > 2010-07-07 16:21:11,164 ERROR > > org.apache.hadoop.hdfs.server.namenode.NameNode: > > java.lang.NullPointerException > > at > > > > > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1006) > > at > > > > > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:982) > > at > > > > > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:194) > > at > > > > > org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:615) > > at > > > > > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992) > > at > > > > > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812) > > at > > > > > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364) > > at > > > > > org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) > > at > > > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311) > > at > > > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292) > > at > > > > > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) > > at > > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279) > > at > > > > > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) > > at > > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) > > > > 2010-07-07 16:21:11,165 INFO > > org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: > > /************************************************************ > > SHUTDOWN_MSG: Shutting down NameNode at fanta/192.168.10.53 > > ************************************************************/ > > > > > > On Wed, Jul 7, 2010 at 14:46, Peter Falk <[email protected]> wrote: > > > > > Hi, > > > > > > After a restart of our live cluster today, the name node fails to start > > > with the log message seen below. There is a big file called edits.new > in > > the > > > "current" folder that seems be the only one that have received changes > > > recently (no changes to the edits or the fsimage for over a month). Is > > that > > > normal? > > > > > > The last change to the edits.new file was right before shutting down > the > > > cluster. It seems like the shutdown was unable to store valid fsimage, > > > edits, edits.new files. The secondary name node image does not include > > the > > > edits.new file, only edits and fsimage, which are identical to the name > > > nodes version. So no help from them. > > > > > > Would appreciate any help in understanding what could have gone wrong. > > The > > > shutdown seemed to complete just fine, without any error message. Is > > there > > > any way to recreate the image from the data, or any other way to save > our > > > production data? > > > > > > Sincerely, > > > Peter > > > > > > 2010-07-07 14:30:26,949 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: > > > Initializing RPC Metrics with hostName=NameNode, port=9000 > > > 2010-07-07 14:30:26,960 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: > > > Initializing JVM Metrics with processName=NameNode, sessionId=null > > > 2010-07-07 14:30:27,019 DEBUG > > > org.apache.hadoop.security.UserGroupInformation: Unix Login: > hbase,hbase > > > 2010-07-07 14:30:27,149 ERROR > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem > > > initialization failed. > > > java.io.EOFException > > > at java.io.DataInputStream.readShort(DataInputStream.java:298) > > > at > > > > > > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881) > > > at > > > > > > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807) > > > at > > > > > > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364) > > > at > > > > > > org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) > > > at > > > > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311) > > > at > > > > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292) > > > at > > > > > > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) > > > at > > > > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279) > > > at > > > > > > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) > > > at > > > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) > > > 2010-07-07 14:30:27,150 INFO org.apache.hadoop.ipc.Server: Stopping > > server > > > on 9000 > > > 2010-07-07 14:30:27,151 ERROR > > > org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException > > > at java.io.DataInputStream.readShort(DataInputStream.java:298) > > > at > > > > > > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881) > > > at > > > > > > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807) > > > at > > > > > > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364) > > > at > > > > > > org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) > > > at > > > > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311) > > > at > > > > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292) > > > at > > > > > > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) > > > at > > > > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279) > > > at > > > > > > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) > > > at > > > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965 > > > > > >
