[
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Thomas updated HDFS-6482:
-------------------------------
Attachment: HDFS-6482.3.patch
Made changes suggested by Arpit. I don't think that deletion of empty
directories is necessary -- it was not done in the previous scheme and the
benefit in terms of faster directory listings and lookups seems marginal (and
there is some chance that the directory will be recreated at a later time). I
have added a third subdir level (with the 25th to 32nd bits of the block ID) to
further reduce the likelihood of directory blowup in large clusters. For a
cluster with N blocks (to clarify, this means that N blocks have been created
over the lifetime of the cluster, but some may have been deleted), the upper
bound on the number of files in any DN directory is now N/2^24, so even for
clusters with 2^30 (~1 billion) blocks created over their lifetimes we should
have fairly small directories. I don't think there's any need to implement
further logic to prevent a directory from exceeding 256 entries, since this
can't happen anyway with clusters with fewer than 2^32 blocks created, and even
then the probability is very small.
> Use block ID-based block layout on datanodes
> --------------------------------------------
>
> Key: HDFS-6482
> URL: https://issues.apache.org/jira/browse/HDFS-6482
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Affects Versions: 2.5.0
> Reporter: James Thomas
> Assignee: James Thomas
> Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch,
> HDFS-6482.patch
>
>
> Right now blocks are placed into directories that are split into many
> subdirectories when capacity is reached. Instead we can use a block's ID to
> determine the path it should go in. This eliminates the need for the LDir
> data structure that facilitates the splitting of directories when they reach
> capacity as well as fields in ReplicaInfo that keep track of a replica's
> location.
> An extension of the work in HDFS-3290.
--
This message was sent by Atlassian JIRA
(v6.2#6252)