[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Thomas updated HDFS-6482:
-------------------------------

    Attachment: HDFS-6482.3.patch

Made changes suggested by Arpit. I don't think that deletion of empty 
directories is necessary -- it was not done in the previous scheme and the 
benefit in terms of faster directory listings and lookups seems marginal (and 
there is some chance that the directory will be recreated at a later time). I 
have added a third subdir level (with the 25th to 32nd bits of the block ID) to 
further reduce the likelihood of directory blowup in large clusters. For a 
cluster with N blocks (to clarify, this means that N blocks have been created 
over the lifetime of the cluster, but some may have been deleted), the upper 
bound on the number of files in any DN directory is now N/2^24, so even for 
clusters with 2^30 (~1 billion) blocks created over their lifetimes we should 
have fairly small directories. I don't think there's any need to implement 
further logic to prevent a directory from exceeding 256 entries, since this 
can't happen anyway with clusters with fewer than 2^32 blocks created, and even 
then the probability is very small.

> Use block ID-based block layout on datanodes
> --------------------------------------------
>
>                 Key: HDFS-6482
>                 URL: https://issues.apache.org/jira/browse/HDFS-6482
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.5.0
>            Reporter: James Thomas
>            Assignee: James Thomas
>         Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, 
> HDFS-6482.patch
>
>
> Right now blocks are placed into directories that are split into many 
> subdirectories when capacity is reached. Instead we can use a block's ID to 
> determine the path it should go in. This eliminates the need for the LDir 
> data structure that facilitates the splitting of directories when they reach 
> capacity as well as fields in ReplicaInfo that keep track of a replica's 
> location.
> An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to