[ 
http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12426343 ] 
            
Bryan Pendleton commented on HADOOP-64:
---------------------------------------

Why do datanodes need to checkpoint? What's the value of storing out the 
mapping, vs. re-enumerating them at startup time? The namenode doesn't keep 
track of what nodes have which blocks, why should a storage node keep track any 
more rigorously within its own state? I'd argue that all of that complexity is 
needless - the cost of maintaining a consistent state is way too high for 
little benefit.

Please make it very easy to change the block-allocation code. The default 
behaviors of the current code have been causing troubles on my very 
heterogenous cluster for a very long time - uniform distribution only really 
actually makes sense if the same amount of space is available on each drive. 
For all other cases, doing this leads immediately to unnecessary failures.

I'm not sure about the "blocks considered lost on read-only volumes" bit, but, 
if that implies that the blocks become unavailable, then I think the approach 
is too heavy-handed. Those blocks might be the only copies, and ignoring them 
means that cluster might not be able to find a live copy of a block anywhere 
else. Please clarify what a "lost" block is.

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a 
> node runs its disks JBOD this means running a Datanode per disk on the 
> machine. While the scheme works reasonably well on small clusters, on larger 
> installations (several 100 nodes) it implies a very large number of Datanodes 
> with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a 
> single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to