[
https://issues.apache.org/jira/browse/HDFS-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272929#comment-13272929
]
Todd Lipcon commented on HDFS-3335:
-----------------------------------
In {{EditLogFileInputStream.nextOp}}, we should log a WARN message with the
file name and data on how many bytes are skipped at the end of the file. This
way, if there is an error replaying later, you might notice that in fact you
did want to recover some of these edits. Having the warning in the log will
make it easier to find where they went.
In this place, it would also be nice to detect how many of those bytes were
just 0xffffffff padding vs data that potentially looks like transactions.
----
- Rename {{GarbageAfterTerminatorException.getOffset}} to something a little
more clear -- right now it's not obvious that this is a relative offset/length
after the OP_INVALID, versus an offset since the beginning of the file, etc.
Perhaps {{getPaddingLengthAfterEofMarker}}? I'm still not entirely clear what
this length represents... by my reading of the javadoc, it is:
{code}
<--- valid edits ---> < OP_INVALID > <-- N bytes of padding --> <-- non-padding
data --> EOF
{code}
where {{N}} above is what you're talking about?
Maybe some ASCII art like the above in the javadoc would be helpful.
Part of what is confusing me is this: does padding after OP_INVALID count as
garbage or not?
----
{code}
+ /** Testing hook */
+ void setEditLog(FSEditLog newLog) {
{code}
Can you add @VisibleForTesting and change to {{setEditLogForTesting}} so no one
starts to use it in non-test code?
----
- Lots of spurious whitespace changes in TestNameNodeRecovery
- Can you add brief javadoc to the three implementations of Corruptor? eg "/**
Truncate the last byte of the file */", "/* Add padding followed by some
non-padding bytes to the end of the file */" and "/** Add only padding to the
end of the file */"?
Otherwise really nice tests.
> check for edit log corruption at the end of the log
> ---------------------------------------------------
>
> Key: HDFS-3335
> URL: https://issues.apache.org/jira/browse/HDFS-3335
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 0.23.0
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HDFS-3335-b1.001.patch, HDFS-3335-b1.002.patch,
> HDFS-3335-b1.003.patch, HDFS-3335-b1.004.patch, HDFS-3335.001.patch,
> HDFS-3335.002.patch, HDFS-3335.003.patch, HDFS-3335.004.patch,
> HDFS-3335.005.patch, HDFS-3335.006.patch, HDFS-3335.007.patch
>
>
> Even after encountering an OP_INVALID, we should check the end of the edit
> log to make sure that it contains no more edits.
> This will catch things like rare race conditions or log corruptions that
> would otherwise remain undetected. They will got from being silent data loss
> scenarios to being cases that we can detect and fix.
> Using recovery mode, we can choose to ignore the end of the log if necessary.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira