[
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025693#comment-14025693
]
Colin Patrick McCabe commented on HDFS-6482:
--------------------------------------------
bq. DFS_DATANODE_NUMBLOCKS_DEFAULT is currently 64. I am not sure why the
default was set so low. It would be good to know the reason before we change
the behavior. It was quite possibly an arbitrary choice.
So, back in the really old days (think ext2), there were performance issues for
directories with a large number of files (10,000+). See wikipedia's page on
ext2 here: http://en.wikipedia.org/wiki/Ext2. The LDir subdirectory mechanism
was intended to alleviate this.
More recent filesystems like ext4 (and recent revisions of ext3) have what's
called "directory indices." This basically means that there is an index which
allows you to look up a particular entry in a directory in less than O(N) time.
This makes having directories with a huge number of entries possible.
It's still nice to have multiple directories to avoid overloading {{readdir}}
(when we have to do that-- for example, to find a metadata file without knowing
its genstamp) and to make inspecting things easier. Plus, it allows us to stay
compatible with systems that don't handle giant directories well.
bq. After ~4 million blocks we would start putting more than 256 blocks in each
leaf subdirectory. With every 4M blocks, we'd add 256 files to each leaf. I
think this is fine since 4 million blocks itself is going to be very unlikely.
I recall as late as Vista NTFS directory listings would get noticeably slow
with thousands of files per directory. Is there any performance loss with
always having three levels of subdirectories, restricting each to 256 children
at the most?
It's an interesting idea, but after all, as you pointed out, even to get to
1,024 blocks per subdirectory (which still isn't "thousands" but is a single
thousand") under James' scheme would require 16 million blocks. At that point,
it seems like there will be other problems. We can always evolve the directory
and metadata naming structure again once 16 million blocks is on the horizon
(and we probably will have to do other things too, like investigate off-heap
memory storage)
> Use block ID-based block layout on datanodes
> --------------------------------------------
>
> Key: HDFS-6482
> URL: https://issues.apache.org/jira/browse/HDFS-6482
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Affects Versions: 2.5.0
> Reporter: James Thomas
> Assignee: James Thomas
> Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch
>
>
> Right now blocks are placed into directories that are split into many
> subdirectories when capacity is reached. Instead we can use a block's ID to
> determine the path it should go in. This eliminates the need for the LDir
> data structure that facilitates the splitting of directories when they reach
> capacity as well as fields in ReplicaInfo that keep track of a replica's
> location.
> An extension of the work in HDFS-3290.
--
This message was sent by Atlassian JIRA
(v6.2#6252)