Things still work in hdfs but the edits file is not being updated. Timestamp is sept 2nd.
-jeremy On Sep 7, 2011, at 9:45 AM, Ravi Prakash <[email protected]> wrote: > If your HDFS is still working, the fsimage file won't be getting updated but > the edits file still should. That's why I asked question 2. > > On Wed, Sep 7, 2011 at 11:39 AM, Jeremy Hansen <[email protected]> wrote: > >> The problem is that fsimage and edits are no longer being updated, so…if I >> restart, how could it replay those? >> >> -jeremy >> >> >> On Sep 7, 2011, at 8:48 AM, Ravi Prakash wrote: >> >>> Actually I take that back. Restarting the NN might not result in loss of >>> data. It will probably just take longer to start up because it would read >>> the fsimage, then apply the fsedits (rather than the SNN doing it). >>> >>> On Wed, Sep 7, 2011 at 10:46 AM, Ravi Prakash <[email protected]> >> wrote: >>> >>>> Hi Jeremy, >>>> >>>> Couple of questions: >>>> >>>> 1. Which version of Hadoop are you using? >>>> 2. If you write something into HDFS, can you subsequently read it? >>>> 3. Are you sure your secondarynamenode configuration is correct? It >> seems >>>> like your SNN is telling your NN to roll the edit log (move the >> journaling >>>> directory from current to .new), but when it tries to download the image >>>> file, its not finding it. >>>> 3. I wish I could say I haven't ever seen that stack trace in the logs. >> I >>>> was seeing something similar (not the same, quite far from it actually) >> ( >>>> https://issues.apache.org/jira/browse/HDFS-2011 ). >>>> >>>> If I were you, and I felt exceptionally brave (mind you I've worked with >>>> only test systems, no production sys-admin guts for me ;-) ) I would >>>> probably do everything I can, to get the secondarynamenode started >> properly >>>> and make it checkpoint properly. >>>> >>>> Me thinks restarting the namenode will most likely result in loss of >> data. >>>> >>>> Hope this helps >>>> Ravi. >>>> >>>> >>>> >>>> >>>> On Tue, Sep 6, 2011 at 7:26 PM, Jeremy Hansen <[email protected]> >> wrote: >>>> >>>>> >>>>> I happened to notice this today and being fairly new to administering >>>>> hadoop, I'm not exactly sure how to pull out of this situation without >> data >>>>> loss. >>>>> >>>>> The checkpoint hasn't happened since Sept 2nd. >>>>> >>>>> -rw-r--r-- 1 hdfs hdfs 8889 Sep 2 14:09 edits >>>>> -rw-r--r-- 1 hdfs hdfs 195968056 Sep 2 14:09 fsimage >>>>> -rw-r--r-- 1 hdfs hdfs 195979439 Sep 2 14:09 fsimage.ckpt >>>>> -rw-r--r-- 1 hdfs hdfs 8 Sep 2 14:09 fstime >>>>> -rw-r--r-- 1 hdfs hdfs 100 Sep 2 14:09 VERSION >>>>> >>>>> /mnt/data0/dfs/nn/image >>>>> -rw-r--r-- 1 hdfs hdfs 157 Sep 2 14:09 fsimage >>>>> >>>>> I'm also seeing this in the NN logs: >>>>> >>>>> 2011-09-06 16:48:23,738 INFO >> org.apache.hadoop.hdfs.server.**namenode.FSNamesystem: >>>>> Roll Edit Log from 10.10.10.11 >>>>> 2011-09-06 16:48:23,740 WARN org.mortbay.log: /getimage: >>>>> java.io.IOException: GetImage failed. java.lang.NullPointerException >>>>> at >> org.apache.hadoop.hdfs.server.**namenode.FSImage.getImageFile(* >>>>> *FSImage.java:219) >>>>> at org.apache.hadoop.hdfs.server.**namenode.FSImage.** >>>>> getFsImageName(FSImage.java:**1584) >>>>> at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.** >>>>> run(GetImageServlet.java:75) >>>>> at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.** >>>>> run(GetImageServlet.java:70) >>>>> at java.security.**AccessController.doPrivileged(**Native Method) >>>>> at javax.security.auth.Subject.**doAs(Subject.java:396) >>>>> at org.apache.hadoop.security.**UserGroupInformation.doAs(** >>>>> UserGroupInformation.java:**1115) >>>>> at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet.** >>>>> doGet(GetImageServlet.java:70) >>>>> at javax.servlet.http.**HttpServlet.service(** >>>>> HttpServlet.java:707) >>>>> at javax.servlet.http.**HttpServlet.service(** >>>>> HttpServlet.java:820) >>>>> at org.mortbay.jetty.servlet.**ServletHolder.handle(** >>>>> ServletHolder.java:511) >>>>> at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.** >>>>> doFilter(ServletHandler.java:**1221) >>>>> at org.apache.hadoop.http.**HttpServer$QuotingInputFilter.** >>>>> doFilter(HttpServer.java:824) >>>>> at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.** >>>>> doFilter(ServletHandler.java:**1212) >>>>> at org.mortbay.jetty.servlet.**ServletHandler.handle(** >>>>> ServletHandler.java:399) >>>>> at org.mortbay.jetty.security.**SecurityHandler.handle(** >>>>> SecurityHandler.java:216) >>>>> at org.mortbay.jetty.servlet.**SessionHandler.handle(** >>>>> SessionHandler.java:182) >>>>> at org.mortbay.jetty.handler.**ContextHandler.handle(** >>>>> ContextHandler.java:766) >>>>> at org.mortbay.jetty.webapp.**WebAppContext.handle(** >>>>> WebAppContext.java:450) >>>>> at >> org.mortbay.jetty.handler.**ContextHandlerCollection.**handle(* >>>>> *ContextHandlerCollection.java:**230) >>>>> at org.mortbay.jetty.handler.**HandlerWrapper.handle(** >>>>> HandlerWrapper.java:152) >>>>> at org.mortbay.jetty.Server.**handle(Server.java:326) >>>>> at org.mortbay.jetty.**HttpConnection.handleRequest(** >>>>> HttpConnection.java:542) >>>>> at org.mortbay.jetty.**HttpConnection$RequestHandler.** >>>>> headerComplete(HttpConnection.**java:928) >>>>> at org.mortbay.jetty.HttpParser.**parseNext(HttpParser.java:549) >>>>> at org.mortbay.jetty.HttpParser.**parseAvailable(HttpParser.** >>>>> java:212) >>>>> at org.mortbay.jetty.**HttpConnection.handle(** >>>>> HttpConnection.java:404) >>>>> >>>>> On the secondary name node: >>>>> >>>>> 2011-09-06 16:51:53,538 ERROR >> org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode: >>>>> java.io.FileNotFoundException: http://ftrr-nam6000.** >>>>> chestermcgee.com:50070/**getimage?getimage=1< >> http://ftrr-nam6000.chestermcgee.com:50070/getimage?getimage=1> >>>>> at >> sun.reflect.**NativeConstructorAccessorImpl.**newInstance0(Native >>>>> Method) >>>>> at sun.reflect.**NativeConstructorAccessorImpl.**newInstance(** >>>>> NativeConstructorAccessorImpl.**java:39) >>>>> at >> sun.reflect.**DelegatingConstructorAccessorI**mpl.newInstance(* >>>>> *DelegatingConstructorAccessorI**mpl.java:27) >>>>> at >> java.lang.reflect.Constructor.**newInstance(Constructor.java:** >>>>> 513) >>>>> at sun.net.www.protocol.http.**HttpURLConnection$6.run(** >>>>> HttpURLConnection.java:1360) >>>>> at java.security.**AccessController.doPrivileged(**Native Method) >>>>> at sun.net.www.protocol.http.**HttpURLConnection.** >>>>> getChainedException(**HttpURLConnection.java:1354) >>>>> at >> sun.net.www.protocol.http.**HttpURLConnection.**getInputStream( >>>>> **HttpURLConnection.java:1008) >>>>> at org.apache.hadoop.hdfs.server.**namenode.TransferFsImage.** >>>>> getFileClient(TransferFsImage.**java:183) >>>>> at >> org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$3.** >>>>> run(SecondaryNameNode.java:**348) >>>>> at >> org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$3.** >>>>> run(SecondaryNameNode.java:**337) >>>>> at java.security.**AccessController.doPrivileged(**Native Method) >>>>> at javax.security.auth.Subject.**doAs(Subject.java:396) >>>>> at org.apache.hadoop.security.**UserGroupInformation.doAs(** >>>>> UserGroupInformation.java:**1115) >>>>> at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.** >>>>> downloadCheckpointFiles(**SecondaryNameNode.java:337) >>>>> at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.** >>>>> doCheckpoint(**SecondaryNameNode.java:422) >>>>> at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.** >>>>> doWork(SecondaryNameNode.java:**313) >>>>> at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.** >>>>> run(SecondaryNameNode.java:**276) >>>>> at java.lang.Thread.run(Thread.**java:619) >>>>> Caused by: java.io.FileNotFoundException: http://ftrr-nam6000.las1.** >>>>> fanops.net:50070/getimage?**getimage=1< >> http://ftrr-nam6000.las1.fanops.net:50070/getimage?getimage=1> >>>>> at >> sun.net.www.protocol.http.**HttpURLConnection.**getInputStream( >>>>> **HttpURLConnection.java:1303) >>>>> at >> sun.net.www.protocol.http.**HttpURLConnection.**getHeaderField( >>>>> **HttpURLConnection.java:2165) >>>>> at org.apache.hadoop.hdfs.server.**namenode.TransferFsImage.** >>>>> getFileClient(TransferFsImage.**java:175) >>>>> ... 10 more >>>>> >>>>> Any help would be very much appreciated. I'm scared to shut down the >> NN. >>>>> I've tried restarting the 2NN. >>>>> >>>>> Thank You >>>>> -jeremy >>>>> >>>> >>>> >> >>
