[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

Arpit Agarwal (JIRA) Thu, 19 Jun 2014 00:19:26 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037061#comment-14037061
 ]


Arpit Agarwal commented on HDFS-6482:
-------------------------------------

Good point about the dentry cache. I did not spend enough time to understand 
your probabilistic analysis. However with a quick and dirty calculation I agree 
that blowup is unlikely.

Even assuming 64TB disks, 8MB average block size (very conservative) and 
uniform distribution of block files in subdir, the expected number of files per 
subdir is 2 * (64TB / (8MB * 256 * 256)) = 256. The 2-level approach looks fine 
to me.

> Use block ID-based block layout on datanodes
> --------------------------------------------
>
>                 Key: HDFS-6482
>                 URL: https://issues.apache.org/jira/browse/HDFS-6482
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.5.0
>            Reporter: James Thomas
>            Assignee: James Thomas
>         Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, 
> HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.patch
>
>
> Right now blocks are placed into directories that are split into many 
> subdirectories when capacity is reached. Instead we can use a block's ID to 
> determine the path it should go in. This eliminates the need for the LDir 
> data structure that facilitates the splitting of directories when they reach 
> capacity as well as fields in ReplicaInfo that keep track of a replica's 
> location.
> An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

Reply via email to