[
https://issues.apache.org/jira/browse/HDFS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287803#comment-13287803
]
Todd Lipcon commented on HDFS-3049:
-----------------------------------
{code}
+ * We wil currently try each edit log stream exactly once. In other words, we
{code}
typo: 'wil'
----
- please add a blank line before and after the definition of the {{State}} enum
- there are some WATERMELONs in your patch I think you should probably remove :)
- within a given RedundantEditLogInputStream, there's an expectation that they
all have the same start txid, right? you should check this in your
preconditions loop
- not sure of the logic for EOF: let's say I have two streams, one is tx 1-15,
and the other is 1-20. When we sort, they'll be in the order (1-20, 1-15). I
then encounter an error at txid #5 in the first stream, so I switch to the
second stream. This stream will then return "null" after reading txid #15, even
though there are really 5 more txns in the group. Right?
----
{code}
+ streams[curIdx].getName() + ". During automatic failover, " +
+ "we noticed that all of the remaining edit log streams are " +
+ "shorter than the current one! The best " +
{code}
I don't like using the term "automatic failover" here - because that's the
terminology we use for HA. Instead, perhaps something like "We could not find
any other edit log which contains transactions following txid %d"?
---
{code}
+ LOG.error("Got error reading edit log input stream " +
+ streams[curIdx].getName(), prevException);
+ LOG.error("failing over to edit log " +
+ streams[curIdx + 1].getName());
{code}
Combine these into one log message
-
I find the state machine here somewhat confusing. Is there no clearer way to
write it? Maybe an ascii art transition diagram would help, or at least for
each state a list of which states it can transition to, and under what
circumstances? To paraphrase someone or other, it's "not obviouslly incorrect"
but also not "obviously correct" :)
> During the normal loading NN startup process, fall back on a different
> EditLog if we see one that is corrupt
> ------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-3049
> URL: https://issues.apache.org/jira/browse/HDFS-3049
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: name-node
> Affects Versions: 0.23.0
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Priority: Minor
> Attachments: HDFS-3049.001.patch, HDFS-3049.002.patch,
> HDFS-3049.003.patch, HDFS-3049.005.against3335.patch,
> HDFS-3049.006.against3335.patch, HDFS-3049.007.against3335.patch,
> HDFS-3049.010.patch, HDFS-3049.011.patch, HDFS-3049.012.patch,
> HDFS-3049.013.patch, HDFS-3049.015.patch, HDFS-3049.017.patch,
> HDFS-3049.018.patch, HDFS-3049.021.patch, HDFS-3049.023.patch,
> HDFS-3049.025.patch, HDFS-3049.026.patch
>
>
> During the NameNode startup process, we load an image, and then apply edit
> logs to it until we believe that we have all the latest changes.
> Unfortunately, if there is an I/O error while reading any of these files, in
> most cases, we simply abort the startup process. We should try harder to
> locate a readable edit log and/or image file.
> *There are three main use cases for this feature:*
> 1. If the operating system does not honor fsync (usually due to a
> misconfiguration), a file may end up in an inconsistent state.
> 2. In certain older releases where we did not use fallocate() or similar to
> pre-reserve blocks, a disk full condition may cause a truncated log in one
> edit directory.
> 3. There may be a bug in HDFS which results in some of the data directories
> receiving corrupt data, but not all. This is the least likely use case.
> *Proposed changes to normal NN startup*
> * We should try a different FSImage if we can't load the first one we try.
> * We should examine other FSEditLogs if we can't load the first one(s) we try.
> * We should fail if we can't find EditLogs that would bring us up to what we
> believe is the latest transaction ID.
> Proposed changes to recovery mode NN startup:
> we should list out all the available storage directories and allow the
> operator to select which one he wants to use.
> Something like this:
> {code}
> Multiple storage directories found.
> 1. /foo/bar
> edits__curent__XYZ size:213421345 md5:2345345
> image size:213421345 md5:2345345
> 2. /foo/baz
> edits__curent__XYZ size:213421345 md5:2345345345
> image size:213421345 md5:2345345
> Which one would you like to use? (1/2)
> {code}
> As usual in recovery mode, we want to be flexible about error handling. In
> this case, this means that we should NOT fail if we can't find EditLogs that
> would bring us up to what we believe is the latest transaction ID.
> *Not addressed by this feature*
> This feature will not address the case where an attempt to access the
> NameNode name directory or directories hangs because of an I/O error. This
> may happen, for example, when trying to load an image from a hard-mounted NFS
> directory, when the NFS server has gone away. Just as now, the operator will
> have to notice this problem and take steps to correct it.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira