[
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037061#comment-14037061
]
Arpit Agarwal commented on HDFS-6482:
-------------------------------------
Good point about the dentry cache. I did not spend enough time to understand
your probabilistic analysis. However with a quick and dirty calculation I agree
that blowup is unlikely.
Even assuming 64TB disks, 8MB average block size (very conservative) and
uniform distribution of block files in subdir, the expected number of files per
subdir is 2 * (64TB / (8MB * 256 * 256)) = 256. The 2-level approach looks fine
to me.
> Use block ID-based block layout on datanodes
> --------------------------------------------
>
> Key: HDFS-6482
> URL: https://issues.apache.org/jira/browse/HDFS-6482
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Affects Versions: 2.5.0
> Reporter: James Thomas
> Assignee: James Thomas
> Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch,
> HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.patch
>
>
> Right now blocks are placed into directories that are split into many
> subdirectories when capacity is reached. Instead we can use a block's ID to
> determine the path it should go in. This eliminates the need for the LDir
> data structure that facilitates the splitting of directories when they reach
> capacity as well as fields in ReplicaInfo that keep track of a replica's
> location.
> An extension of the work in HDFS-3290.
--
This message was sent by Atlassian JIRA
(v6.2#6252)