[
https://issues.apache.org/jira/browse/HDFS-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16304855#comment-16304855
]
Xiao Chen commented on HDFS-12369:
----------------------------------
As it turned out, this issue may have some morphs on the exact symptom,
depending on what the file status is at the time of recovery. At the core of
the issue, the lease recovery of a deleted file should not edit log anything,
which is what this jira fixed.
{noformat}
2017-12-27 15:38:57,360 ERROR
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception
on operation CloseOp [length=0, inodeId=0, path=/filename, replication=3,
mtime=1514301485930, atime=1514297585263, blockSize=268435456,
blocks=[blk_1863506432_791165194, blk_1863506631_791165393,
blk_1863506826_791165588], permissions=hdfs:superuser:rw-r--r--,
aclEntries=null, clientName=, clientMachine=, overwrite=false,
storagePolicyId=0, opCode=OP_CLOSE, txid=10577364851]
java.io.IOException: Mismatched block IDs or generation stamps, attempting to
replace block blk_1863518793_791177559 with blk_1863506432_791165194 as block #
0/3 of /filename
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:942)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:434)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:897)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:750)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:318)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1125)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:789)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:844)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:823)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
{noformat}
This happened when a file with the same name is created (1st time) -> deleted
-> created by a 2nd user -> lease recovered (of the 1st creation) -> closed by
the 2nd user, causing 2 close ops in the edits.
> Edit log corruption due to hard lease recovery of not-closed file which has
> snapshots
> -------------------------------------------------------------------------------------
>
> Key: HDFS-12369
> URL: https://issues.apache.org/jira/browse/HDFS-12369
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Reporter: Xiao Chen
> Assignee: Xiao Chen
> Fix For: 2.9.0, 3.0.0-beta1, 2.8.3
>
> Attachments: HDFS-12369.01.patch, HDFS-12369.02.patch,
> HDFS-12369.03.patch, HDFS-12369.test.patch
>
>
> HDFS-6257 and HDFS-7707 worked hard to prevent corruption from combinations
> of client operations.
> Recently, we have observed NN not able to start with the following exception:
> {noformat}
> 2017-08-17 14:32:18,418 ERROR
> org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
> java.io.FileNotFoundException: File does not exist:
> /home/Events/CancellationSurvey_MySQL/2015/12/31/.part-00000.9nlJ3M
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:429)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:897)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:750)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:318)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1125)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:789)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:844)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:823)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
> {noformat}
> Quoting a nicely analysed edits:
> {quote}
> In the edits logged about 1 hour later, we see this failing OP_CLOSE. The
> sequence in the edits shows the file going through:
> OPEN
> ADD_BLOCK
> CLOSE
> ADD_BLOCK # perhaps this was an append
> DELETE
> (about 1 hour later) CLOSE
> It is interesting that there was no CLOSE logged before the delete.
> {quote}
> Grepping that file name, it turns out the close was triggered by
> {{LeaseManager}}, when the lease reaches hard limit.
> {noformat}
> 2017-08-16 15:05:45,927 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> Recovering [Lease. Holder: DFSClient_NONMAPREDUCE_-1997177597_28, pending
> creates: 75],
> src=/home/Events/CancellationSurvey_MySQL/2015/12/31/.part-00000.9nlJ3M
> 2017-08-16 15:05:45,927 WARN org.apache.hadoop.hdfs.StateChange: BLOCK*
> internalReleaseLease: All existing blocks are COMPLETE, lease removed, file
> /home/Events/CancellationSurvey_MySQL/2015/12/31/.part-00000.9nlJ3M closed.
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]