[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025802#comment-14025802
 ] 

James Thomas commented on HDFS-6482:
------------------------------------

Kihwal, we were considering using some sort of deterministic probing (as in 
hash tables) to find less full directories if the initial directory for a block 
is full. Do you think the cost (and additional complexity) of this sort of 
scheme is justified given the relatively low probability (given the uniform 
block distribution assumption, at least) of directory blowup?

Additionally, I want to note that if the total number of blocks in the cluster 
is N, N/2^16 is a strict upper bound on the number of blocks in a single 
directory on any DN, assuming completely sequential block IDs. So for a small 
cluster we can't see any blowup.

> Use block ID-based block layout on datanodes
> --------------------------------------------
>
>                 Key: HDFS-6482
>                 URL: https://issues.apache.org/jira/browse/HDFS-6482
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.5.0
>            Reporter: James Thomas
>            Assignee: James Thomas
>         Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch
>
>
> Right now blocks are placed into directories that are split into many 
> subdirectories when capacity is reached. Instead we can use a block's ID to 
> determine the path it should go in. This eliminates the need for the LDir 
> data structure that facilitates the splitting of directories when they reach 
> capacity as well as fields in ReplicaInfo that keep track of a replica's 
> location.
> An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to