[
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025786#comment-14025786
]
James Thomas commented on HDFS-6482:
------------------------------------
Thanks for the review, Arpit, and thanks for the follow-up, Colin. I want to
clarify one thing -- the numbers 4 million and 16 million that both of you
mention are, as far as I understand, actually numbers of blocks for the ENTIRE
cluster, not just a single DN. Suppose we had a cluster of 16 million blocks
(with sequential block IDs), we could in theory have a single DN with a
directory as large as 1024 entries, if we got unlucky with the assignment of
blocks to DNs. Assuming uniform distribution of blocks across the DNs available
in the cluster and a maximum # of blocks per DN of 2^24, we have an expected #
of blocks per directory of 256. I don't know how accurate this assumption is.
> Use block ID-based block layout on datanodes
> --------------------------------------------
>
> Key: HDFS-6482
> URL: https://issues.apache.org/jira/browse/HDFS-6482
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Affects Versions: 2.5.0
> Reporter: James Thomas
> Assignee: James Thomas
> Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch
>
>
> Right now blocks are placed into directories that are split into many
> subdirectories when capacity is reached. Instead we can use a block's ID to
> determine the path it should go in. This eliminates the need for the LDir
> data structure that facilitates the splitting of directories when they reach
> capacity as well as fields in ReplicaInfo that keep track of a replica's
> location.
> An extension of the work in HDFS-3290.
--
This message was sent by Atlassian JIRA
(v6.2#6252)