[ 
https://issues.apache.org/jira/browse/HDFS-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052908#comment-13052908
 ] 

Todd Lipcon commented on HDFS-2026:
-----------------------------------

bq. Can we remove Checkpointer#uploadCheckpoint commented out? 
Added TODO -- the CN/BN will be addressed separately

bq. testReformatNNBetweenCheckpoints method comment is missing a period
fixed

bq. The new call to sd.read in SecondaryNameNode#recoverCreate could use a 
comment
added:
{code}
          case NORMAL:
            // Read the VERSION file. This verifies that:
            // (a) the VERSION file for each of the directories is the same,
            // and (b) when we connect to a NN, we can verify that the remote
            // node matches the same namespace that we ran on previously.
            sd.read();
            break;
{code}

bq. As an aside, readVersionFile would be a better name for that method
I agree, but we should do that separately -- this function gets used throughout 
all of HDFS (eg also on the DN side)

bq. Not you change would be good to add a comment to uploadImageFromStorage 
indicating it doesn't actually post an image but the 2NN posts to the NN asking 
it to get an image
Added the following javadoc:
{code}
  /**
   * Requests that the NameNode download an image from this node.
   *
   * @param fsName the http address for the remote NN
   * @param imageListenAddress the host/port where the local node is running an
   *                           HTTPServer hosting GetImageServlet
   * @param storage the storage directory to transfer the image from
   * @param txid the transaction ID of the image to be uploaded
   */
{code}
and this comment:
{code}
    // this doesn't directly upload an image, but rather asks the NN
    // to connect back to the 2NN to download the specified image.
    TransferFsImage.getFileClient(fsName, fileid, null, false);
...
{code}

> 1073: 2NN needs to handle case of reformatted NN better
> -------------------------------------------------------
>
>                 Key: HDFS-2026
>                 URL: https://issues.apache.org/jira/browse/HDFS-2026
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: name-node
>    Affects Versions: Edit log branch (HDFS-1073)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: Edit log branch (HDFS-1073)
>
>         Attachments: hdfs-2026.txt
>
>
> Currently in the 1073 branch, the following steps ends up with a very 
> confused 2NN:
> - format NN, run NN
> - start 2NN, perform some checkpoints
> - reformat NN, start NN on new namespace
> - restart same 2NN
> The 2NN currently saves the new VERSION info into its local storage directory 
> but doesn't clear out the old checkpoint or edits files. This is obviously 
> wrong and might lead to a corrupt checkpoint getting uploaded. 
> If the 2NN has storage directories with VERSION info, and connects to an NN 
> with different VERSION info, there are two alternatives:
> a) refuse to perform any checkpoints until the operator issues a 
> "secondarynamenode -format" command (this is similar to how the 
> backupnode/checkpointnode works)
> b) clear the current contents of the storage directory and save the new NN's 
> VERSION info.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to