[
https://issues.apache.org/jira/browse/HDFS-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819179#comment-13819179
]
Jing Zhao commented on HDFS-5425:
---------------------------------
Thanks for the work Vinay and Uma!
The issue here is that we want to replace an INodeFile to an INodeFileUC.
However, because of the rename operation, the original INodeFile is actually
referenced by INodeReference instances here. So in the unit test in Vinay's
patch, before the replacement, we have:
{code}
snapshot s0
deleted list: bar2 (INodeReference.WithName)
created list: bar2 (INodeReference.DstReference)
{code}
where these two bar2 instances are pointing to the same WithCount node. The
WithCount node is then pointing to the real INodeFile instance.
Thus for the replacement, we only need to let the WithCount node point to a new
INodeFileUC instance, instead of replacing the reference nodes in the diff list
of s0.
> Renaming underconstruction file with snapshots can make NN failure on restart
> -----------------------------------------------------------------------------
>
> Key: HDFS-5425
> URL: https://issues.apache.org/jira/browse/HDFS-5425
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.0.0, 2.2.0
> Reporter: sathish
> Assignee: Vinay
> Attachments: HDFS-5425.patch, HDFS-5425.patch, HDFS-5425.patch
>
>
> I faced this When i am doing some snapshot operations like
> createSnapshot,renameSnapshot,i restarted my NN,it is shutting down with
> exception,
> 2013-10-24 21:07:03,040 FATAL
> org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
> java.lang.IllegalStateException
> at
> com.google.common.base.Preconditions.checkState(Preconditions.java:133)
> at
> org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$ChildrenDiff.replace(INodeDirectoryWithSnapshot.java:82)
> at
> org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$ChildrenDiff.access$700(INodeDirectoryWithSnapshot.java:62)
> at
> org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$DirectoryDiffList.replaceChild(INodeDirectoryWithSnapshot.java:397)
> at
> org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$DirectoryDiffList.access$900(INodeDirectoryWithSnapshot.java:376)
> at
> org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot.replaceChild(INodeDirectoryWithSnapshot.java:598)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedReplaceINodeFile(FSDirectory.java:1548)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.replaceINodeFile(FSDirectory.java:1537)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadFilesUnderConstruction(FSImageFormat.java:855)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:350)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:910)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:899)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:751)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:720)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:266)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:784)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:563)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:422)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:472)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:670)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:655)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1245)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1311)
> 2013-10-24 21:07:03,050 INFO org.apache.hadoop.util.ExitUtil: Exiting with
> status 1
> 2013-10-24 21:07:03,052 INFO org.apache.hadoop.hdfs.server.namenode.NameNode:
> SHUTDOWN_MSG:
--
This message was sent by Atlassian JIRA
(v6.1#6144)