[
https://issues.apache.org/jira/browse/HDFS-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157876#comment-16157876
]
Hudson commented on HDFS-12369:
-------------------------------
SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12813 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/12813/])
HDFS-12369. Edit log corruption due to hard lease recovery of not-closed (xiao:
rev 52b894db33bc68b46eec5cdf2735dfcf4030853a)
* (edit)
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeleteRace.java
* (edit)
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* (edit)
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
> Edit log corruption due to hard lease recovery of not-closed file which has
> snapshots
> -------------------------------------------------------------------------------------
>
> Key: HDFS-12369
> URL: https://issues.apache.org/jira/browse/HDFS-12369
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Reporter: Xiao Chen
> Assignee: Xiao Chen
> Fix For: 2.9.0, 3.0.0-beta1, 2.8.3
>
> Attachments: HDFS-12369.01.patch, HDFS-12369.02.patch,
> HDFS-12369.03.patch, HDFS-12369.test.patch
>
>
> HDFS-6257 and HDFS-7707 worked hard to prevent corruption from combinations
> of client operations.
> Recently, we have observed NN not able to start with the following exception:
> {noformat}
> 2017-08-17 14:32:18,418 ERROR
> org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
> java.io.FileNotFoundException: File does not exist:
> /home/Events/CancellationSurvey_MySQL/2015/12/31/.part-00000.9nlJ3M
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:429)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:897)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:750)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:318)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1125)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:789)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:844)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:823)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
> {noformat}
> Quoting a nicely analysed edits:
> {quote}
> In the edits logged about 1 hour later, we see this failing OP_CLOSE. The
> sequence in the edits shows the file going through:
> OPEN
> ADD_BLOCK
> CLOSE
> ADD_BLOCK # perhaps this was an append
> DELETE
> (about 1 hour later) CLOSE
> It is interesting that there was no CLOSE logged before the delete.
> {quote}
> Grepping that file name, it turns out the close was triggered by
> {{LeaseManager}}, when the lease reaches hard limit.
> {noformat}
> 2017-08-16 15:05:45,927 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> Recovering [Lease. Holder: DFSClient_NONMAPREDUCE_-1997177597_28, pending
> creates: 75],
> src=/home/Events/CancellationSurvey_MySQL/2015/12/31/.part-00000.9nlJ3M
> 2017-08-16 15:05:45,927 WARN org.apache.hadoop.hdfs.StateChange: BLOCK*
> internalReleaseLease: All existing blocks are COMPLETE, lease removed, file
> /home/Events/CancellationSurvey_MySQL/2015/12/31/.part-00000.9nlJ3M closed.
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]