Re: [jira] Commented: (HADOOP-124) don't permit two datanodes to run from same dfs.data.dir

Eric Baldeschwieler Wed, 17 May 2006 20:23:01 -0700

why not store the cluster in the data node?

On May 17, 2006, at 6:39 PM, Konstantin Shvachko (JIRA) wrote:

[ http://issues.apache.org/jira/browse/HADOOP-124?page=comments#action_12412273 ]
Konstantin Shvachko commented on HADOOP-124:
--------------------------------------------

For future development in this direction.
We should persistently store on the name node all storage IDs,which the
name node ever assigned any blocks to.
With that knowledge the name node can reject blocks from any newly
registered data storages that are not on the name node list.
In other words when a data node registers NEW data storage itshould notreport any blocks from that storage, and the name node caneffectively verify
that since it never assigned any blocks to this storage.
This would prevent us from accidentally connecting data nodesrepresenting
different clusters (DFS instances).
don't permit two datanodes to run from same dfs.data.dir
--------------------------------------------------------

         Key: HADOOP-124
         URL: http://issues.apache.org/jira/browse/HADOOP-124
     Project: Hadoop
        Type: Bug
  Components: dfs
    Versions: 0.2
 Environment: ~30 node cluster
    Reporter: Bryan Pendleton
    Assignee: Konstantin Shvachko
    Priority: Critical
     Fix For: 0.3
 Attachments: DatanodeRegister.txt, DirNotSharing.patch

DFS files are still rotting.
I suspect that there's a problem with block accounting/detectingidentical hosts in the namenode. I have 30 physical nodes, withvarious numbers of local disks, meaning that my current 'bin/hadoop dfs -report" shows 80 nodes after a full restart. However,when I discovered the problem (which resulted in losing about500gb worth of temporary data because of missing blocks in some ofthe larger chunks) -report showed 96 nodes. I suspect somehowthere were extra datanodes running against the same paths, andthat the namenode was counting those as replicated instances,which then showed up over-replicated, and one of them was told todelete its local block, leading to the block actually getting lost.I will debug it more the next time the situation arises. This isat least the 5th time I've had a large amount of file data "rot"in DFS since January.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of theadministrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Re: [jira] Commented: (HADOOP-124) don't permit two datanodes to run from same dfs.data.dir

Reply via email to