I happened to notice this today and being fairly new to administering
hadoop, I'm not exactly sure how to pull out of this situation without
data loss.
The checkpoint hasn't happened since Sept 2nd.
-rw-r--r-- 1 hdfs hdfs 8889 Sep 2 14:09 edits
-rw-r--r-- 1 hdfs hdfs 195968056 Sep 2 14:09 fsimage
-rw-r--r-- 1 hdfs hdfs 195979439 Sep 2 14:09 fsimage.ckpt
-rw-r--r-- 1 hdfs hdfs 8 Sep 2 14:09 fstime
-rw-r--r-- 1 hdfs hdfs 100 Sep 2 14:09 VERSION
/mnt/data0/dfs/nn/image
-rw-r--r-- 1 hdfs hdfs 157 Sep 2 14:09 fsimage
I'm also seeing this in the NN logs:
2011-09-06 16:48:23,738 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
10.10.10.11
2011-09-06 16:48:23,740 WARN org.mortbay.log: /getimage: java.io.IOException:
GetImage failed. java.lang.NullPointerException
at
org.apache.hadoop.hdfs.server.namenode.FSImage.getImageFile(FSImage.java:219)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.getFsImageName(FSImage.java:1584)
at
org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:75)
at
org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:70)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at
org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:70)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at
org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:824)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
On the secondary name node:
2011-09-06 16:51:53,538 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
java.io.FileNotFoundException:
http://ftrr-nam6000.chestermcgee.com:50070/getimage?getimage=1
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1360)
at java.security.AccessController.doPrivileged(Native Method)
at
sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1354)
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1008)
at
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:183)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$3.run(SecondaryNameNode.java:348)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$3.run(SecondaryNameNode.java:337)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.downloadCheckpointFiles(SecondaryNameNode.java:337)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:422)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:313)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:276)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.FileNotFoundException:
http://ftrr-nam6000.las1.fanops.net:50070/getimage?getimage=1
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1303)
at
sun.net.www.protocol.http.HttpURLConnection.getHeaderField(HttpURLConnection.java:2165)
at
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:175)
... 10 more
Any help would be very much appreciated. I'm scared to shut down the NN. I've
tried restarting the 2NN.
Thank You
-jeremy