[
https://issues.apache.org/jira/browse/HDFS-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286080#comment-13286080
]
Colin Patrick McCabe commented on HDFS-3479:
--------------------------------------------
Hi Nicholas,
In the patch it says "We don't check the last two megabytes of the edit log, in
case the NameNode crashed while writing to the edit log."
Basically, if we crash while writing to the end of the log, the underlying
filesystem does not give us the guarantees we would need to get every byte
perfect. Consider the following sequence of events:
1. NN allocates an extra 1 MB at the end of the file and fills it with 0xff
bytes
2. NN writes an opcode to the edit log file. It happens to span two sectors on
the hard disk
3. The kernel writes the second half of the opcode to disk
4. system crash
In this case, we're left with a file that looks like this:
{code}
0xff 0xff 0xff 0xff ... [opcode bytes]... 0xff 0xff 0xff
{code}
This would clearly fail validation. Hence the NameNode would fail to start,
even though no data has been lost (the opcode was never acked to the client).
This would be a serious problem. UNCHECKED_REGION_LENGTH fixes this problem.
We can't control the order in which the kernel flushes sectors out of the
buffer cache and on to the hard disk. We can set up barriers (that is what
fsync is), but control of the ordering is beyond us.
> backport HDFS-3335 (check for edit log corruption at the end of the log) to
> branch-1
> ------------------------------------------------------------------------------------
>
> Key: HDFS-3479
> URL: https://issues.apache.org/jira/browse/HDFS-3479
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 1.0.0
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HDFS-3335-b1.005.patch
>
>
> backport HDFS-3335 (check for edit log corruption at the end of the log) to
> branch-1
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira