[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

Kihwal Lee (JIRA) Mon, 09 Jun 2014 14:23:37 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025761#comment-14025761
 ]


Kihwal Lee commented on HDFS-6482:
----------------------------------

BlockIDs are sequential nowadays. With the proposed block distribution method,  
leaf dirs can get severely unbalanced, especially in smaller clusters.  Besides 
the cost of looking up entries in a directory, directory lock contention can 
become high and hurt performance if many files are created and read from a 
small set of directories. I think limiting the number to 64 kind of imposed a 
cap on how contentious it can be.  We might do better by more evenly 
distributing blocks. 

> Use block ID-based block layout on datanodes
> --------------------------------------------
>
>                 Key: HDFS-6482
>                 URL: https://issues.apache.org/jira/browse/HDFS-6482
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.5.0
>            Reporter: James Thomas
>            Assignee: James Thomas
>         Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch
>
>
> Right now blocks are placed into directories that are split into many 
> subdirectories when capacity is reached. Instead we can use a block's ID to 
> determine the path it should go in. This eliminates the need for the LDir 
> data structure that facilitates the splitting of directories when they reach 
> capacity as well as fields in ReplicaInfo that keep track of a replica's 
> location.
> An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes

Reply via email to