I happened to notice this today and being fairly new to administering hadoop, I'm not exactly sure how to pull out of this situation without data loss.

The checkpoint hasn't happened since Sept 2nd.

-rw-r--r-- 1 hdfs hdfs        8889 Sep  2 14:09 edits
-rw-r--r-- 1 hdfs hdfs   195968056 Sep  2 14:09 fsimage
-rw-r--r-- 1 hdfs hdfs   195979439 Sep  2 14:09 fsimage.ckpt
-rw-r--r-- 1 hdfs hdfs           8 Sep  2 14:09 fstime
-rw-r--r-- 1 hdfs hdfs         100 Sep  2 14:09 VERSION

/mnt/data0/dfs/nn/image
-rw-r--r-- 1 hdfs hdfs    157 Sep  2 14:09 fsimage

I'm also seeing this in the NN logs:

2011-09-06 16:48:23,738 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 
10.10.10.11
2011-09-06 16:48:23,740 WARN org.mortbay.log: /getimage: java.io.IOException: 
GetImage failed. java.lang.NullPointerException
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.getImageFile(FSImage.java:219)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.getFsImageName(FSImage.java:1584)
        at 
org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:75)
        at 
org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:70)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
        at 
org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:70)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
        at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
        at 
org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:824)
        at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)

On the secondary name node:

2011-09-06 16:51:53,538 ERROR 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
java.io.FileNotFoundException: 
http://ftrr-nam6000.chestermcgee.com:50070/getimage?getimage=1
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at 
sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1360)
        at java.security.AccessController.doPrivileged(Native Method)
        at 
sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1354)
        at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1008)
        at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:183)
        at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$3.run(SecondaryNameNode.java:348)
        at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$3.run(SecondaryNameNode.java:337)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
        at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.downloadCheckpointFiles(SecondaryNameNode.java:337)
        at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:422)
        at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:313)
        at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:276)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.FileNotFoundException: 
http://ftrr-nam6000.las1.fanops.net:50070/getimage?getimage=1
        at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1303)
        at 
sun.net.www.protocol.http.HttpURLConnection.getHeaderField(HttpURLConnection.java:2165)
        at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:175)
        ... 10 more

Any help would be very much appreciated.  I'm scared to shut down the NN.  I've 
tried restarting the 2NN.

Thank You
-jeremy

Reply via email to