[
https://issues.apache.org/jira/browse/HDFS-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153924#comment-14153924
]
Kihwal Lee commented on HDFS-7174:
----------------------------------
The threshold is set to 50,000 In the patch. This will make the existing
listing semantics is preserved for regular applications, while special
applications can benefit from the performance improvement. It converts back to
the normal list when it hits 90% the threshold.
> Support for more efficient large directories
> --------------------------------------------
>
> Key: HDFS-7174
> URL: https://issues.apache.org/jira/browse/HDFS-7174
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Kihwal Lee
> Assignee: Kihwal Lee
> Priority: Critical
> Attachments: HDFS-7174.patch
>
>
> When the number of children under a directory grows very large, insertion
> becomes very costly. E.g. creating 1M entries takes 10s of minutes. This is
> because the complexity of an insertion is O\(n\). As the size of a list
> grows, the overhead grows n^2. (integral of linear function). It also causes
> allocations and copies of big arrays.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)