[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025559#comment-14025559
 ] 

Arpit Agarwal commented on HDFS-6482:
-------------------------------------

{{DFS_DATANODE_NUMBLOCKS_DEFAULT}} is currently 64. I am not sure why the 
default was set so low. It would be good to know the reason before we change 
the behavior. It was quite possibly an arbitrary choice.

After ~4 million blocks we would start putting more than 256 blocks in each 
leaf subdirectory. With every 4M blocks, we'd add 256 files to each leaf. I 
think this is fine since 4 million blocks itself is going to be very unlikely. 
I recall as late as Vista NTFS directory listings would get noticeably slow 
with thousands of files per directory. Is there any performance loss with 
always having three levels of subdirectories, restricting each to 256 children 
at the most?

- Who removes empty subdirectories when blocks are deleted?
- Let's avoid suffixing hex numerals to "subdir" for consistency with the 
existing naming convention.
- StringBuilder looks unnecessary in {{idToBlockDir}}.
- We should add a release note stating that {{DFS_DATANODE_NUMBLOCKS_DEFAULT}} 
is obsolete.

The approach looks good and a big +1 for removing LDir.

> Use block ID-based block layout on datanodes
> --------------------------------------------
>
>                 Key: HDFS-6482
>                 URL: https://issues.apache.org/jira/browse/HDFS-6482
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.5.0
>            Reporter: James Thomas
>            Assignee: James Thomas
>         Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch
>
>
> Right now blocks are placed into directories that are split into many 
> subdirectories when capacity is reached. Instead we can use a block's ID to 
> determine the path it should go in. This eliminates the need for the LDir 
> data structure that facilitates the splitting of directories when they reach 
> capacity as well as fields in ReplicaInfo that keep track of a replica's 
> location.
> An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to