[
http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12427026 ]
Konstantin Shvachko commented on HADOOP-64:
-------------------------------------------
This proposition looks good to me.
The only thing that seems excessive is the dynamic data structures for
maintaining
blockid-to-directory mapping.
The alternative is to do a static mapping based on blockids and the number of
directories.
Suppose that the maximal number of entries per directory is N. We should define
a function
dirName( blockId, N, dirLevel )
which returns a local directory name for each level of the directory tree.
So the datanode needs to store only the current hight of the directory tree H.
Then for a given blockId, its path is determined by
/ dirName(blockId,N,0) / dirName(blockId,N,1) / ... / dirName(blockId,N,H)
And when the datanode needs to add a new directory level it will not need
to rename anything in the existing directory tree.
I see a disadvantage of this approach, that the directories should be
re-structured if the maximal number of entries per directory is changed.
But the same is applicable for the dynamic approach, at least when N is
decreased.
We might consider hardcoding N rather than having it configurable.
> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
> Key: HADOOP-64
> URL: http://issues.apache.org/jira/browse/HADOOP-64
> Project: Hadoop
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.2.0
> Reporter: Sameer Paranjpye
> Assigned To: Milind Bhandarkar
> Priority: Minor
> Fix For: 0.6.0
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a
> node runs its disks JBOD this means running a Datanode per disk on the
> machine. While the scheme works reasonably well on small clusters, on larger
> installations (several 100 nodes) it implies a very large number of Datanodes
> with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a
> single machine.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira