[ 
http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12426393 ] 
            
Konstantin Shvachko commented on HADOOP-64:
-------------------------------------------

 = I believe there was a misunderstanding on the datanode checkpointing issue.
HADOOP-306 proposes to checkpoint only the list of datanodes, effectively 
DatanodeInfo.
It was not meant to store the datanode block reports.
The block map is not and should not be checkpointed.

= DF on Windows will return the drive letter, which can be used to distinguish 
disks.
It will work only for local disks though. Mounted (mapped network) drives on 
Windows won't work.

= I agree storageID should be the same per node. It will need to be stored 
separately on each drive.
Otherwise, if only one drive stores the id and gets corrupted we will not be 
able to restore the
storage id for other drives. Also, the storage files on each drive should be 
locked when the datanode
starts to prevent from running multiple data nodes with the same blocks.

= It is a good idea that the number of directories is a power of 2.
But I do not support the idea to reserve any number of bits of block-id to 
determine block locations, for 2 reasons.
a) Block replicas can have different locations on different data nodes.
b) The block id is issued by the namenode, and it is not good if the namenode 
will need to
know about a datanode storage setup.
Instead, we can partition bit representation of the block id into a number of 
parts consistent
with the number of directories and e.g. XOR them. The result will represent the 
directory name.
I think this will be random enough.

= I don't think the datanode can be even start on a read-only disk.
The storage file won't open.

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a 
> node runs its disks JBOD this means running a Datanode per disk on the 
> machine. While the scheme works reasonably well on small clusters, on larger 
> installations (several 100 nodes) it implies a very large number of Datanodes 
> with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a 
> single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to