[jira] [Commented] (HDFS-7174) Support for more efficient large directories

Konstantin Shvachko (JIRA) Fri, 03 Oct 2014 11:30:13 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158305#comment-14158305
 ]


Konstantin Shvachko commented on HDFS-7174:
-------------------------------------------

This is a long standing issue, which people tend to solve externally by 
creating artificial hierarchies of directories similar to block directories on 
DNs. An internal solution would be great to have. I am probably late to the 
party, but for whatever it worth.
Did you consider using balanced trees for inode lists, something like B-trees?
If the (btree) block size is large enough, it will work as an array (single 
node tree), while if it grows it will split itself automatically.
Then for most directories it will retain current behviour, but one can also 
have a completely flat namespace without worrying about long arrays.

> Support for more efficient large directories
> --------------------------------------------
>
>                 Key: HDFS-7174
>                 URL: https://issues.apache.org/jira/browse/HDFS-7174
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>         Attachments: HDFS-7174.new.patch, HDFS-7174.patch, HDFS-7174.patch
>
>
> When the number of children under a directory grows very large, insertion 
> becomes very costly.  E.g. creating 1M entries takes 10s of minutes.  This is 
> because the complexity of an insertion is O\(n\). As the size of a list 
> grows, the overhead grows n^2. (integral of linear function).  It also causes 
> allocations and copies of big arrays.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7174) Support for more efficient large directories

Reply via email to