[ 
https://issues.apache.org/jira/browse/HDFS-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977747#action_12977747
 ] 

Gokul commented on HDFS-107:
----------------------------


The second approach looks fine to me.

I feel the datanode losing blocks when it connects to empty namenode mistakenly 
is not a drawaback at all.
In the current scenario, even if a datanode mistakenly connects to another 
namenode, the probability of the namenode having the same blocks(of this 
datanode) in its blocksmap is very less. The namenode most times will 
invalidate the blocks.. 

{quote}
2) Format the name-node only. When data-nodes connect to the name-node it will 
tell them to
format their storage directories if it sees that the namespace is empty and its 
cTime=0.
The drawback of this approach is that we can loose blocks of a data-node from 
another cluster
if it connects by mistake to the empty name-node.
{quote}

When the datanode starts(after the namenode is formatted and started), can we 
override the namespace ID of the datanode  with with the new namespace ID of 
the namenode instead of throwing exception?



> Data-nodes should be formatted when the name-node is formatted.
> ---------------------------------------------------------------
>
>                 Key: HDFS-107
>                 URL: https://issues.apache.org/jira/browse/HDFS-107
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Konstantin Shvachko
>
> The upgrade feature HADOOP-702 requires data-nodes to store persistently the 
> namespaceID 
> in their version files and verify during startup that it matches the one 
> stored on the name-node.
> When the name-node reformats it generates a new namespaceID.
> Now if the cluster starts with the reformatted name-node, and not reformatted 
> data-nodes
> the data-nodes will fail with
> java.io.IOException: Incompatible namespaceIDs ...
> Data-nodes should be reformatted whenever the name-node is. I see 2 
> approaches here:
> 1) In order to reformat the cluster we call "start-dfs -format" or make a 
> special script "format-dfs".
> This would format the cluster components all together. The question is 
> whether it should start
> the cluster after formatting?
> 2) Format the name-node only. When data-nodes connect to the name-node it 
> will tell them to
> format their storage directories if it sees that the namespace is empty and 
> its cTime=0.
> The drawback of this approach is that we can loose blocks of a data-node from 
> another cluster
> if it connects by mistake to the empty name-node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to