[ 
https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987164#comment-16987164
 ] 

Wei-Chiu Chuang commented on HDFS-15012:
----------------------------------------

We are actively working on this. However given the nature of the problem it is 
not easy to reproduce in a unit test format.
To be on the safe side, I would suggest reverting HDFS-13101 for now.

Bump the priority to blocker and add release-blocker label to this jira.

> NN fails to parse Edit logs after applying HDFS-13101
> -----------------------------------------------------
>
>                 Key: HDFS-15012
>                 URL: https://issues.apache.org/jira/browse/HDFS-15012
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: nn
>            Reporter: Eric Lin
>            Assignee: Shashikant Banerjee
>            Priority: Blocker
>              Labels: release-blocker
>
> After applying HDFS-13101, and deleting and creating large number of 
> snapshots, SNN exited with below error:
>   
> {code:sh}
> 2019-11-18 08:28:06,528 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
> on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, 
> snapshotName=distcp-3479-31-old, 
> RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc
> CallId=1]
> java.lang.AssertionError: Element already exists: 
> element=partition_isactive=true, DELETED=[partition_isactive=true]
>         at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193)
>         at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239)
>         at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462)
>         at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240)
>         at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250)
>         at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755)
>         at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753)
>         at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790)
>         at 
> org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332)
>         at 
> org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583)
>         at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760)
>         at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753)
>         at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790)
>         at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235)
>         at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259)
>         at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:844)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:823)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
> {code}
> We confirmed that fsimage and edit files were NOT corrupted, as reverting 
> HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken 
> and failed to parse edit log files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to