[ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12426393 ] Konstantin Shvachko commented on HADOOP-64: -------------------------------------------
= I believe there was a misunderstanding on the datanode checkpointing issue. HADOOP-306 proposes to checkpoint only the list of datanodes, effectively DatanodeInfo. It was not meant to store the datanode block reports. The block map is not and should not be checkpointed. = DF on Windows will return the drive letter, which can be used to distinguish disks. It will work only for local disks though. Mounted (mapped network) drives on Windows won't work. = I agree storageID should be the same per node. It will need to be stored separately on each drive. Otherwise, if only one drive stores the id and gets corrupted we will not be able to restore the storage id for other drives. Also, the storage files on each drive should be locked when the datanode starts to prevent from running multiple data nodes with the same blocks. = It is a good idea that the number of directories is a power of 2. But I do not support the idea to reserve any number of bits of block-id to determine block locations, for 2 reasons. a) Block replicas can have different locations on different data nodes. b) The block id is issued by the namenode, and it is not good if the namenode will need to know about a datanode storage setup. Instead, we can partition bit representation of the block id into a number of parts consistent with the number of directories and e.g. XOR them. The result will represent the directory name. I think this will be random enough. = I don't think the datanode can be even start on a read-only disk. The storage file won't open. > DataNode should be capable of managing multiple volumes > ------------------------------------------------------- > > Key: HADOOP-64 > URL: http://issues.apache.org/jira/browse/HADOOP-64 > Project: Hadoop > Issue Type: Improvement > Components: dfs > Affects Versions: 0.2.0 > Reporter: Sameer Paranjpye > Assigned To: Milind Bhandarkar > Priority: Minor > Fix For: 0.6.0 > > > The dfs Datanode can only store data on a single filesystem volume. When a > node runs its disks JBOD this means running a Datanode per disk on the > machine. While the scheme works reasonably well on small clusters, on larger > installations (several 100 nodes) it implies a very large number of Datanodes > with associated management overhead in the Namenode. > The Datanod should be enhanced to be able to handle multiple volumes on a > single machine. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
