Hey, Yesterday we restarted our Name Node for the first time in awhile to push out some new configuration updates to it. Upon it starting again we got this error :-
2010-12-01 10:59:39,635 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 121229 2010-12-01 10:59:41,578 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 126 2010-12-01 10:59:41,598 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 19581054 loaded in 1 seconds. 2010-12-01 10:59:41,600 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1073) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1085) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:992) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:195) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:615) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:999) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:312) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:293) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:224) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:306) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1004) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1013) Which I tracked down to being a race error with the edit log saving :- https://issues.apache.org/jira/browse/HDFS-909 We fixed this by applying the patch from here https://issues.apache.org/jira/browse/HDFS-1002 which meant we could start the name node and let it fix the edit log, but meant we lost some files from HDFS.. We're using CHD2-169.68, and this bug was fixed in CHD2-169.113 released in September so I would recommend everyone upgrades to that! Thanks, -- Dan Harvey | Datamining Engineer www.mendeley.com/profiles/dan-harvey Mendeley Limited | London, UK | www.mendeley.com Registered in England and Wales | Company Number 6419015