[jira] [Commented] (HDFS-3049) During the normal loading NN startup process, fall back on a different EditLog if we see one that is corrupt

Todd Lipcon (JIRA) Fri, 01 Jun 2012 17:38:25 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287803#comment-13287803
 ]


Todd Lipcon commented on HDFS-3049:
-----------------------------------

{code}
+ * We wil currently try each edit log stream exactly once.  In other words, we
{code}
typo: 'wil'
----
- please add a blank line before and after the definition of the {{State}} enum

- there are some WATERMELONs in your patch I think you should probably remove :)

- within a given RedundantEditLogInputStream, there's an expectation that they 
all have the same start txid, right? you should check this in your 
preconditions loop

- not sure of the logic for EOF: let's say I have two streams, one is tx 1-15, 
and the other is 1-20. When we sort, they'll be in the order (1-20, 1-15). I 
then encounter an error at txid #5 in the first stream, so I switch to the 
second stream. This stream will then return "null" after reading txid #15, even 
though there are really 5 more txns in the group. Right?

----

{code}
+              streams[curIdx].getName() + ".  During automatic failover, " +
+              "we noticed that all of the remaining edit log streams are " +
+              "shorter than the current one!  The best " + 
{code}

I don't like using the term "automatic failover" here - because that's the 
terminology we use for HA. Instead, perhaps something like "We could not find 
any other edit log which contains transactions following txid %d"?

---

{code}
+        LOG.error("Got error reading edit log input stream " +
+          streams[curIdx].getName(), prevException);
+        LOG.error("failing over to edit log " + 
+          streams[curIdx + 1].getName());
{code}
Combine these into one log message

-

I find the state machine here somewhat confusing. Is there no clearer way to 
write it? Maybe an ascii art transition diagram would help, or at least for 
each state a list of which states it can transition to, and under what 
circumstances? To paraphrase someone or other, it's "not obviouslly incorrect" 
but also not "obviously correct" :)

                
> During the normal loading NN startup process, fall back on a different 
> EditLog if we see one that is corrupt
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-3049
>                 URL: https://issues.apache.org/jira/browse/HDFS-3049
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>         Attachments: HDFS-3049.001.patch, HDFS-3049.002.patch, 
> HDFS-3049.003.patch, HDFS-3049.005.against3335.patch, 
> HDFS-3049.006.against3335.patch, HDFS-3049.007.against3335.patch, 
> HDFS-3049.010.patch, HDFS-3049.011.patch, HDFS-3049.012.patch, 
> HDFS-3049.013.patch, HDFS-3049.015.patch, HDFS-3049.017.patch, 
> HDFS-3049.018.patch, HDFS-3049.021.patch, HDFS-3049.023.patch, 
> HDFS-3049.025.patch, HDFS-3049.026.patch
>
>
> During the NameNode startup process, we load an image, and then apply edit 
> logs to it until we believe that we have all the latest changes.  
> Unfortunately, if there is an I/O error while reading any of these files, in 
> most cases, we simply abort the startup process.  We should try harder to 
> locate a readable edit log and/or image file.
> *There are three main use cases for this feature:*
> 1. If the operating system does not honor fsync (usually due to a 
> misconfiguration), a file may end up in an inconsistent state.
> 2. In certain older releases where we did not use fallocate() or similar to 
> pre-reserve blocks, a disk full condition may cause a truncated log in one 
> edit directory.
> 3. There may be a bug in HDFS which results in some of the data directories 
> receiving corrupt data, but not all.  This is the least likely use case.
> *Proposed changes to normal NN startup*
> * We should try a different FSImage if we can't load the first one we try.
> * We should examine other FSEditLogs if we can't load the first one(s) we try.
> * We should fail if we can't find EditLogs that would bring us up to what we 
> believe is the latest transaction ID.
> Proposed changes to recovery mode NN startup:
> we should list out all the available storage directories and allow the 
> operator to select which one he wants to use.
> Something like this:
> {code}
> Multiple storage directories found.
> 1. /foo/bar
>     edits__curent__XYZ          size:213421345       md5:2345345
>     image                                  size:213421345       md5:2345345
> 2. /foo/baz
>     edits__curent__XYZ          size:213421345       md5:2345345345
>     image                                  size:213421345       md5:2345345
> Which one would you like to use? (1/2)
> {code}
> As usual in recovery mode, we want to be flexible about error handling.  In 
> this case, this means that we should NOT fail if we can't find EditLogs that 
> would bring us up to what we believe is the latest transaction ID.
> *Not addressed by this feature*
> This feature will not address the case where an attempt to access the 
> NameNode name directory or directories hangs because of an I/O error.  This 
> may happen, for example, when trying to load an image from a hard-mounted NFS 
> directory, when the NFS server has gone away.  Just as now, the operator will 
> have to notice this problem and take steps to correct it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3049) During the normal loading NN startup process, fall back on a different EditLog if we see one that is corrupt

Reply via email to