[ 
http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12426347 ] 
            
Sameer Paranjpye commented on HADOOP-64:
----------------------------------------

Can we map effectively map volumes to devices on Windows? Will 'df' under 
cygwin produce a comprehensible mapping of paths to devices? Maybe this should 
be left out of the implementation?

Code for monitoring disk capacity on the datanode will need to be updated to 
run 'df' on all volumes considered.  Round robin placement needs to account for 
differences in capacity on the various volumes.

How does this interact with Konstantin's storage id implementation? We will now 
need to have 1 storage-id across multiple volumes.

Do we need to use the last x-bits of a block to map it to a directory? Maybe we 
should use a simple round robin scheme here as well. The amount of state is 
small enough to keep in a hastable, no?

Do we ever need to checkpoint datanodes? Seems like that is a separable 
discussion. In any case, it seems like the less state we keep in side files the 
better it is.

We should include a mechanism to make read-only volumes visible on the 
namenode, as part of the health/status page, so that admins can be alerted in a 
timely manner.

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a 
> node runs its disks JBOD this means running a Datanode per disk on the 
> machine. While the scheme works reasonably well on small clusters, on larger 
> installations (several 100 nodes) it implies a very large number of Datanodes 
> with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a 
> single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to