[ 
http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12426764 ] 
            
Milind Bhandarkar commented on HADOOP-64:
-----------------------------------------

Thanks for your inputs Yoram, Konstantin, Bryan, Sameer.

Here is my modified proposal:

1. The config parameter dfs.data.dir could have a list of directories separated 
by commas.
2. Another config parameter (client.buffer.dir) will contain comma-separated 
list of directories for buffering blocks until they are sent to datanode. DFS 
client will manage the in-memory map of blocks to these directories.
3. Datanode will maintain a map in memory of blockid's mapped to storage 
locations.
4. Datanode will choose appropriate location to write a block based on a 
separate block-to-volume placement strategy. Information about volumes will be 
made available to this strategy with DF.
5. Datanode will try to report correct available diskspace by appropriately 
taking into account the space reported by DF on each volume. If the mount point 
is same for more than one volume, then the available disk space will not be 
counted twice.
6. Storage-ID will be unique per data node, and will be stored in each of the 
volumes at top levels.
7. Each volume will further be separated into a shallow directory hierarchy, 
with maximum of N blocks per directory. This block to directory mapping will 
also be maintained in a hashtable by datanode. as a directory fills up, new 
directory will be created as a sibling, upto a maximum of N siblings. Then 
second level of directories will start. The parameter N can be specified as a 
config variable "dfs.data.numdir".
8. Only if all the volumes specified in dfs.data.dir are read-only, the 
datanode will shutdown. Otherwise, it will log the readonly directories, and 
treat them as if they were never specified in dfs.data.dir list. This behavior 
is consistent with current state of implementation.

If there are any other issues to think about, please comment.


> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a 
> node runs its disks JBOD this means running a Datanode per disk on the 
> machine. While the scheme works reasonably well on small clusters, on larger 
> installations (several 100 nodes) it implies a very large number of Datanodes 
> with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a 
> single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to