[
https://issues.apache.org/jira/browse/HDFS-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165389#comment-14165389
]
Kihwal Lee commented on HDFS-7174:
----------------------------------
bq. As Yi Liu pointed out, the current patch has a problem. If we go back and
forth between switchingThreshold (say, by repeatedly adding and removing a
single element to a directory), we pay a very high cost. To prevent this, the
threshold for converting a INodeHashedArrayList back to a simple INodeArrayList
should be lower than the threshold for doing the opposite conversion.
There is low watermark, with is 90% of the conversion threshold. So it won't
flip back and forth like that.
> Support for more efficient large directories
> --------------------------------------------
>
> Key: HDFS-7174
> URL: https://issues.apache.org/jira/browse/HDFS-7174
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Kihwal Lee
> Assignee: Kihwal Lee
> Priority: Critical
> Attachments: HDFS-7174.new.patch, HDFS-7174.patch, HDFS-7174.patch
>
>
> When the number of children under a directory grows very large, insertion
> becomes very costly. E.g. creating 1M entries takes 10s of minutes. This is
> because the complexity of an insertion is O\(n\). As the size of a list
> grows, the overhead grows n^2. (integral of linear function). It also causes
> allocations and copies of big arrays.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)