[ 
http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12426354 ] 
            
Milind Bhandarkar commented on HADOOP-64:
-----------------------------------------

About check-pointing datanodes. I agree that it is a needless complexity. I was 
confused about this as well. But as Konstantin pointed out to me, the datanode 
checkpointing proposal is NOT checkpointing datanodes' state, but checkpointing 
datanodes' blockreport in the namenode checkpoint. Thanks konstantin.

As the proposal (and the implementation) currently stands, if dfs.data.dir is 
read-only, the datanode reports to be dead, since block-delete etc operations 
cannot be carried out on it. The namenode treats that datanode as dead, and 
tries to re-replicate its blocks on other data nodes. The same behavior will 
continue, except the datanode will not report itself to be dead if at least one 
volume specified in the dfs.data.dir list is read-write. However, it will not 
report blocks contained in read-only volumes.

Storage-ID continues to be one per datanode. Putting blocks in different 
volumes is datanode-internal.

The DF.java contains code to detect mount. This will be considered to be the 
differentiation between different disks. Even if it is not right, it does not 
preclude correct operation of datanode, only performance is affected. 
Performance will be maximized if all volumes specified in dfs.data.dir are 
located on different local disks.

Making read-only mounts visible on namenode is an orthogonal issue. My proposal 
specifies a backward-compatible way of dealing with it.

Using the last x bits to map a block on local directory will minimize 
datanode's state as well as keep the directory size minimal (since block-ids 
are random). Consider it an implicit hashtable on disk.

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a 
> node runs its disks JBOD this means running a Datanode per disk on the 
> machine. While the scheme works reasonably well on small clusters, on larger 
> installations (several 100 nodes) it implies a very large number of Datanodes 
> with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a 
> single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to