[
https://issues.apache.org/jira/browse/HDFS-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon updated HDFS-2026:
------------------------------
Attachment: hdfs-2026.txt
Here's a patch which does the following:
- when 2NN talks to NN or NN talks to 2NN, it passes a ':'-joined version of
its StorageInfo (ie namespaceid, clusterid, etc). If the other side has a
different namespace, it throws an Exception and refuses to process the request
- on startup, the 2NN now reads the storage info from its storage directories
- on a fresh 2NN, it will have no info (and thus namespaceId == 0) -- in this
case it will copy its storage info from the NN the first time it calls
rollEdits and gets a CheckpointSignature. All other times, it verifies the
CheckpointSignature matches the 2NN's storage info.
- I removed the defunct "token" parameter from GetImageServlet since it wasn't
really being used anymore.
- No longer need to validate the transaction ID of the uploaded checkpoint,
since it's OK to upload an out-of-date image. It'll just get removed the next
time the archiver runs.
> 1073: 2NN needs to handle case of reformatted NN better
> -------------------------------------------------------
>
> Key: HDFS-2026
> URL: https://issues.apache.org/jira/browse/HDFS-2026
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: name-node
> Affects Versions: Edit log branch (HDFS-1073)
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Critical
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: hdfs-2026.txt
>
>
> Currently in the 1073 branch, the following steps ends up with a very
> confused 2NN:
> - format NN, run NN
> - start 2NN, perform some checkpoints
> - reformat NN, start NN on new namespace
> - restart same 2NN
> The 2NN currently saves the new VERSION info into its local storage directory
> but doesn't clear out the old checkpoint or edits files. This is obviously
> wrong and might lead to a corrupt checkpoint getting uploaded.
> If the 2NN has storage directories with VERSION info, and connects to an NN
> with different VERSION info, there are two alternatives:
> a) refuse to perform any checkpoints until the operator issues a
> "secondarynamenode -format" command (this is similar to how the
> backupnode/checkpointnode works)
> b) clear the current contents of the storage directory and save the new NN's
> VERSION info.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira