[ 
https://issues.apache.org/jira/browse/HDFS-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2026:
------------------------------

    Attachment: hdfs-2026.txt

Here's a patch which does the following:

- when 2NN talks to NN or NN talks to 2NN, it passes a ':'-joined version of 
its StorageInfo (ie namespaceid, clusterid, etc). If the other side has a 
different namespace, it throws an Exception and refuses to process the request
- on startup, the 2NN now reads the storage info from its storage directories
- on a fresh 2NN, it will have no info (and thus namespaceId == 0) -- in this 
case it will copy its storage info from the NN the first time it calls 
rollEdits and gets a CheckpointSignature. All other times, it verifies the 
CheckpointSignature matches the 2NN's storage info.
- I removed the defunct "token" parameter from GetImageServlet since it wasn't 
really being used anymore.
- No longer need to validate the transaction ID of the uploaded checkpoint, 
since it's OK to upload an out-of-date image. It'll just get removed the next 
time the archiver runs.


> 1073: 2NN needs to handle case of reformatted NN better
> -------------------------------------------------------
>
>                 Key: HDFS-2026
>                 URL: https://issues.apache.org/jira/browse/HDFS-2026
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: name-node
>    Affects Versions: Edit log branch (HDFS-1073)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: Edit log branch (HDFS-1073)
>
>         Attachments: hdfs-2026.txt
>
>
> Currently in the 1073 branch, the following steps ends up with a very 
> confused 2NN:
> - format NN, run NN
> - start 2NN, perform some checkpoints
> - reformat NN, start NN on new namespace
> - restart same 2NN
> The 2NN currently saves the new VERSION info into its local storage directory 
> but doesn't clear out the old checkpoint or edits files. This is obviously 
> wrong and might lead to a corrupt checkpoint getting uploaded. 
> If the 2NN has storage directories with VERSION info, and connects to an NN 
> with different VERSION info, there are two alternatives:
> a) refuse to perform any checkpoints until the operator issues a 
> "secondarynamenode -format" command (this is similar to how the 
> backupnode/checkpointnode works)
> b) clear the current contents of the storage directory and save the new NN's 
> VERSION info.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to