[ 
https://issues.apache.org/jira/browse/HADOOP-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065848#comment-13065848
 ] 

Gerrit Jansen van Vuuren commented on HADOOP-7458:
--------------------------------------------------

This does mean the edit logs is corrupted, but it should be something the 
namenode can recover from. 
Reasoning is: So you've got 30TB of data. For some reason the metadata is 
corrupted and the only corruption is last 100 lines in the edit logs. 
Do you need to loose all of your 30TB data because the last 100 lines are 
corrupt???? This should not be the case.



A very simple fix is the edit the FSImage.java file and add in a try catch 
where it loads the edits log. 
The best solution would be that if this happens, the namenode prints out an 
error, and goes into safemode automatically, but it must still load and not 
crash.
I've had this error recently and decided to recompile the hadoop core jar with 
a simple try catch. It worked and I recovered all of my data, except for some 
hadoop jobtemp files, which I don't care about any how.


The patch is:

Index: src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSImage.java
===================================================================
--- src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSImage.java        
(revision 1145902)
+++ src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSImage.java        
(working copy)
@@ -1003,12 +1003,20 @@
     int numEdits = 0;
     EditLogFileInputStream edits = 
       new EditLogFileInputStream(getImageFile(sd, NameNodeFile.EDITS));
+    try{
     numEdits = FSEditLog.loadFSEdits(edits);
+    }catch(Throwable t){
+      t.printStackTrace();
+    }
     edits.close();
     File editsNew = getImageFile(sd, NameNodeFile.EDITS_NEW);
     if (editsNew.exists() && editsNew.length() > 0) {
       edits = new EditLogFileInputStream(editsNew);
+      try{
       numEdits += FSEditLog.loadFSEdits(edits);
+      }catch(Throwable t){
+        t.printStackTrace();
+      }
       edits.close();
     }



> Namenode not get started! FSNamesystem initialization failed. 
> java.io.FileNotFoundException
> -------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-7458
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7458
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.20.2
>         Environment: CentOS release 5.5 (Final), 18 node Cluster 
>            Reporter: Sakthivel Murugasamy
>            Priority: Blocker
>              Labels: hadoop
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> 2011-07-13 12:04:12,967 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.
> java.io.FileNotFoundException: File does not exist: 
> /opt/data/tmp/mapred/system/job_201107041958_0120/j^@^@^@^@^@^@
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetPermission(FSDirectory.java:544)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:724)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
> 2011-07-13 12:04:13,006 ERROR 
> org.apache.hadoop.hdfs.server.namenode.NameNode: 
> java.io.FileNotFoundException: File does not exist: 
> /opt/data/tmp/mapred/system/job_201107041958_0120/j^@^@^@^@^@^@
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetPermission(FSDirectory.java:544)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:724)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
> In the path /opt/data/tmp/mapred, "system/" folder itself is not available

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to