[ 
https://issues.apache.org/jira/browse/HADOOP-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522670
 ] 

Raghu Angadi commented on HADOOP-1762:
--------------------------------------

Proposed implementation :

 - Namenode stores one integer {{lastStorageId}} persistently
 - When a Namenode starts, it does know about any storageIds except 
{{lastStorageId}}
 - When a datanode D1 registers: {code}
    if ( D1.storageID == 0 or D1.storageID > lastStorageId) {
       D1.storageID = lastStorageIDd++; // take care of overflow etc
       EditLog.write.(LAST_STORAGE_ID, lastStorageID);
    }
    // same as current behaviour
    // Check if D1.storageID is already registered etc
    {code}

- Another simpler alternative: Don't keep track of lastStorageID but always 
assign a random storage id when ever a new storage ID is required. Especially 
if we use 64 bit integer, probability of collision is pretty much as low.

 - What about when lastStorageID is INT_MAX? We can use 64bit integer.. 
probably we should. And even if 32bit integer rolls, its ok. 
 - In either case, collision probability would still be minuscule compared to 
probability of similar damage (losing a datanode).
 - If there is an actual collision, apart from namenode losing one datanode, 
there is another consequence : If two nodes Dx and Dy get the same storage id, 
then each will keep replacing the other at the namenode. To avoid this, 
whenever a new datanode registers with an existing storage id, just assign a 
new storage id, instead of reusing the old one.

 - If we use 'lastStorageID' method, then, when a datanode starts up this 
hadoop 0.15 for the first time, it should zero out its storage id. Apart from 
this, there are no other changes required at the datanode.

I personally prefer the random storage id.





> Namenode does not need to store storageID and datanodeID persistently
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-1762
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1762
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>
> Currently Namenode stores all the storage-ids it generates since the 
> beginning (since last format). It allocates a new storageID everytime a new 
> datanode comes online. It also stores all the known datanode ids since the 
> beginning. 
> It would be better if Namenode did not have to keep track of these. I will 
> describe a proposal in the next comment. 
> This has implecations regd how Namenode helps administrators identify 'dead 
> datanodes' etc. These issues are addressed in HADOOP-1138.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to