[ https://issues.apache.org/jira/browse/HDFS-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006527#comment-13006527 ]
dhruba borthakur commented on HDFS-1594: ---------------------------------------- @Devaraj: ur proposal to make the NN go into safemode if the fsedits partition is almost full sounds good. @Todd: thanks for the info. If there is a possibility of fsedit corruption when the disk is full, (irrespective of the patch suggested in this JIRA), then we need to fix it. This patch tries to avoid this situation but is not foolproof. If the NN can (somehow) know that this is the last partial transaction, then is can safely ignore it. maybe, when we pre-allocate the editslog, we should fill it up with a specific pattern so that it is easy to detect if the partial transaction is the last one? > When the disk becomes full Namenode is getting shutdown and not able to > recover > ------------------------------------------------------------------------------- > > Key: HDFS-1594 > URL: https://issues.apache.org/jira/browse/HDFS-1594 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.21.0, 0.21.1, 0.22.0 > Environment: Linux linux124 2.6.27.19-5-default #1 SMP 2009-02-28 > 04:40:21 +0100 x86_64 x86_64 x86_64 GNU/Linux > Reporter: Devaraj K > Fix For: 0.23.0 > > Attachments: HDFS-1594.patch, HDFS-1594.patch, HDFS-1594.patch, > hadoop-root-namenode-linux124.log > > > When the disk becomes full name node is shutting down and if we try to start > after making the space available It is not starting and throwing the below > exception. > {code:xml} > 2011-01-24 23:23:33,727 ERROR > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem > initialization failed. > java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:180) > at org.apache.hadoop.io.UTF8.readFields(UTF8.java:117) > at > org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.readString(FSImageSerialization.java:201) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:185) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:60) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1089) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1041) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:487) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:149) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:306) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:284) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:328) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:356) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:577) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:570) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1529) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1538) > 2011-01-24 23:23:33,729 ERROR > org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:180) > at org.apache.hadoop.io.UTF8.readFields(UTF8.java:117) > at > org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.readString(FSImageSerialization.java:201) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:185) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:60) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1089) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1041) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:487) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:149) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:306) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:284) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:328) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:356) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:577) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:570) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1529) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1538) > 2011-01-24 23:23:33,730 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: > SHUTDOWN_MSG: > /************************************************************ > SHUTDOWN_MSG: Shutting down NameNode at linux124/10.18.52.124 > ************************************************************/ > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira