[jira] [Commented] (HDFS-7714) Simultaneous restart of HA NameNodes and DataNode can cause DataNode to register successfully with only one NameNode.

Chris Nauroth (JIRA) Fri, 30 Jan 2015 13:06:10 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299223#comment-14299223
 ]


Chris Nauroth commented on HDFS-7714:
-------------------------------------

Here are more details on what I've observed.  I saw that the main 
{{BPServiceActor#run}} loop was active for one NameNode, but for the other one, 
it had reported the fatal "Initialization failed" error from this part of the 
code:

{code}
      while (true) {
        // init stuff
        try {
          // setup storage
          connectToNNAndHandshake();
          break;
        } catch (IOException ioe) {
          // Initial handshake, storage recovery or registration failed
          runningState = RunningState.INIT_FAILED;
          if (shouldRetryInit()) {
            // Retry until all namenode's of BPOS failed initialization
            LOG.error("Initialization failed for " + this + " "
                + ioe.getLocalizedMessage());
            sleepAndLogInterrupts(5000, "initializing");
          } else {
            runningState = RunningState.FAILED;
            LOG.fatal("Initialization failed for " + this + ". Exiting. ", ioe);
            return;
          }
        }
      }
{code}

The {{ioe}} was an {{EOFException}} while trying the {{registerDatanode}} RPC.  
Lining up timestamps from NN and DN logs, I could see that the NN had restarted 
at the same time, causing it to abandon this RPC connection, ultimately 
triggering the {{EOFException}} on the DataNode side.

Most importantly, the fact that it was on the code path with the fatal-level 
logging means that it would never reattempt registration with this NameNode.  
{{shouldRetryInit()}} must have returned {{false}}.  The implementation of 
{{BPOfferService#shouldRetryInit}} is that it should only retry if the other 
one already registered successfully:

{code}
  /*
   * Let the actor retry for initialization until all namenodes of cluster have
   * failed.
   */
  boolean shouldRetryInit() {
    if (hasBlockPoolId()) {
      // One of the namenode registered successfully. lets continue retry for
      // other.
      return true;
    }
    return isAlive();
  }
{code}

Tying that all together, this bug happens when the first attempted NameNode 
registration fails but the second succeeds.  The DataNode process remains 
running, but with only one live {{BPServiceActor}}.

HDFS-2882 had a lot of discussion of DataNode startup failure scenarios.  I 
think the summary of that discussion is that the DataNode should in general 
retry its NameNode registrations, but it should instead abort right away if 
there is no possibility for registration to be successful.  (i.e. There is a 
misconfiguration or a hardware failure.)  I think the change we need here is 
that we should keep retrying the {{registerDatanode}} RPC if there is NameNode 
downtime or transient connectivity failure.  Other failure reasons should still 
cause an abort.


> Simultaneous restart of HA NameNodes and DataNode can cause DataNode to 
> register successfully with only one NameNode.
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-7714
>                 URL: https://issues.apache.org/jira/browse/HDFS-7714
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.6.0
>            Reporter: Chris Nauroth
>
> In an HA deployment, DataNodes must register with both NameNodes and send 
> periodic heartbeats and block reports to both.  However, if NameNodes and 
> DataNodes are restarted simultaneously, then this can trigger a race 
> condition in registration.  The end result is that the {{BPServiceActor}} for 
> one NameNode terminates, but the {{BPServiceActor}} for the other NameNode 
> remains alive.  The DataNode process is then in a "half-alive" state where it 
> only heartbeats and sends block reports to one of the NameNodes.  This could 
> cause a loss of storage capacity after an HA failover.  The DataNode process 
> would have to be restarted to resolve this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7714) Simultaneous restart of HA NameNodes and DataNode can cause DataNode to register successfully with only one NameNode.

Reply via email to