[
https://issues.apache.org/jira/browse/HDFS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276201#comment-13276201
]
Colin Patrick McCabe commented on HDFS-3049:
--------------------------------------------
bq. Will [the RedundantEditLogInputStream error message] have already logged
the offset of the error? Or will the exception itself contain the offset?
Otherwise we should include it in the error message.
Yes, the exception itself contains the offsets, as generated by
FSEditLogLoader. So there's no need to re-add them here.
bq. getPosition() in the merged stream now returns th eposition of the
underlying stream, which increases as we read one file and then resets back to
zero. But, in FSEditLog, we track these offsets for error reporting purposes.
We need to make sure that, if there is an unrecoverable corruption, the log
messages specifically identify the path and offset of the corruption. I'm not
sure that's the case, now that we have the extra abstraction here. Can you try
using a single storage dir and corrupting the logs somewhere in a middle
segment?
I think EditLogInputStream#getPosition just needs to go away and be replaced by
a function that gives a human-readable description of "where you are." This is
especially true because we're soon going to have some edit logs like the quorum
edit logs where there is no real concept of file position. I'm not sure if I'm
brave enough to try to cram that into this change, though.
I will run through a recovery scenario and make sure the current printout makes
sense and can be followed. I'm pretty sure that they do, but it's good to
check.
bq. Regarding memory usage: I'm afraid that each stream opened will end up
maintaining a large buffer, since it's generally wrapped with
BufferedInputStream, and we use mark(100MB). Maybe we should close each stream
as soon as we finish with it, rather than waiting until the close() call at the
end. Have you tested loading a large edit log composed of many segments? eg a
total 1GB log, made of 10 100MB segments, on a NN with say 1G heap?
Good catch. I think I have a solution for this one. Will post shortly.
> During the normal loading NN startup process, fall back on a different
> EditLog if we see one that is corrupt
> ------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-3049
> URL: https://issues.apache.org/jira/browse/HDFS-3049
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: name-node
> Affects Versions: 0.23.0
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Priority: Minor
> Attachments: HDFS-3049.001.patch, HDFS-3049.002.patch,
> HDFS-3049.003.patch, HDFS-3049.005.against3335.patch,
> HDFS-3049.006.against3335.patch, HDFS-3049.007.against3335.patch,
> HDFS-3049.010.patch, HDFS-3049.011.patch
>
>
> During the NameNode startup process, we load an image, and then apply edit
> logs to it until we believe that we have all the latest changes.
> Unfortunately, if there is an I/O error while reading any of these files, in
> most cases, we simply abort the startup process. We should try harder to
> locate a readable edit log and/or image file.
> *There are three main use cases for this feature:*
> 1. If the operating system does not honor fsync (usually due to a
> misconfiguration), a file may end up in an inconsistent state.
> 2. In certain older releases where we did not use fallocate() or similar to
> pre-reserve blocks, a disk full condition may cause a truncated log in one
> edit directory.
> 3. There may be a bug in HDFS which results in some of the data directories
> receiving corrupt data, but not all. This is the least likely use case.
> *Proposed changes to normal NN startup*
> * We should try a different FSImage if we can't load the first one we try.
> * We should examine other FSEditLogs if we can't load the first one(s) we try.
> * We should fail if we can't find EditLogs that would bring us up to what we
> believe is the latest transaction ID.
> Proposed changes to recovery mode NN startup:
> we should list out all the available storage directories and allow the
> operator to select which one he wants to use.
> Something like this:
> {code}
> Multiple storage directories found.
> 1. /foo/bar
> edits__curent__XYZ size:213421345 md5:2345345
> image size:213421345 md5:2345345
> 2. /foo/baz
> edits__curent__XYZ size:213421345 md5:2345345345
> image size:213421345 md5:2345345
> Which one would you like to use? (1/2)
> {code}
> As usual in recovery mode, we want to be flexible about error handling. In
> this case, this means that we should NOT fail if we can't find EditLogs that
> would bring us up to what we believe is the latest transaction ID.
> *Not addressed by this feature*
> This feature will not address the case where an attempt to access the
> NameNode name directory or directories hangs because of an I/O error. This
> may happen, for example, when trying to load an image from a hard-mounted NFS
> directory, when the NFS server has gone away. Just as now, the operator will
> have to notice this problem and take steps to correct it.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira