[
https://issues.apache.org/jira/browse/HDFS-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aaron T. Myers updated HDFS-6647:
---------------------------------
Attachment: HDFS-6647-failing-test.patch
I'm attaching a test case which illustrates the problem. When this problem
occurs, the NN will fail to be able to read the edit log and will fail to start
with an error like the following:
{noformat}
java.io.FileNotFoundException: File does not exist: /test-file
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:64)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:444)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:227)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:136)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:816)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:676)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:279)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:964)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:711)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:530)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:586)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:752)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:736)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1412)
{noformat}
The sequence of events that I've identified that can cause this are the
following:
# A file is opened for write and some data has been written/flushed to it,
causing a block to be allocated.
# A snapshot is taken which includes the file.
# The file is deleted from the present file system, though the client has not
yet closed the file. This will log an OP_DELETE to the edit log.
# Some error happens triggering pipeline recovery, which log an
OP_UPDATE_BLOCKS to the edit log.
The reason it's possible for this to happen is basically because the
{{updatePipeline}} RPC never checks if the file actually exists, but instead
just finds the file INode based on the block ID being replaced in the pipeline.
Later, when we're reading the {{OP_UPDATE_BLOCKS}} from the edit log, however,
we try to find the file INode based on the path name of the file, which no
longer exists because of the previous delete.
It's not entirely obvious to me what the right solution to this issue should
be. It shouldn't be difficult to change the {{FSEditLogLoader}} to be able to
read the {{OP_UPDATE_BLOCKS}} op if we just change it to look up the INode by
block ID. On the other hand, however, I'm not entirely sure we should even be
allowing this sequence of edit log ops in the first place. It doesn't seem
unreasonable to me that we might check that the file actually exists in the
present file system in the {{updatePipeline}} RPC call and throw an error if it
doesn't, since continuing to write to a file that only exists in a snapshot
doesn't make much sense. Along similar lines, it seems a little odd to me that
an INode that only exists in the snapshot would continue to be considered
under-construction, but perhaps that's not unreasonable in itself.
Would love to hear others' thoughts on this.
> Edit log corruption when pipeline recovery occurs for deleted file present in
> snapshot
> --------------------------------------------------------------------------------------
>
> Key: HDFS-6647
> URL: https://issues.apache.org/jira/browse/HDFS-6647
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode, snapshots
> Affects Versions: 2.4.1
> Reporter: Aaron T. Myers
> Attachments: HDFS-6647-failing-test.patch
>
>
> I've encountered a situation wherein an OP_UPDATE_BLOCKS can appear in the
> edit log for a file after an OP_DELETE has previously been logged for that
> file. Such an edit log sequence cannot then be successfully read by the
> NameNode.
> More details in the first comment.
--
This message was sent by Atlassian JIRA
(v6.2#6252)