[
https://issues.apache.org/jira/browse/HDFS-5497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819407#comment-13819407
]
Daryn Sharp commented on HDFS-5497:
-----------------------------------
Would it maybe make sense for the minimum size to be a percent of the total
blocks instead of a fixed size? Ie. default to .3 assuming 1/3 replicas isn't
available? A "don't change capacity" flag would be very useful, or perhaps
making the data structure much more conservative about resizing.
> Performance may suffer when UnderReplicatedBlocks is used heavily
> -----------------------------------------------------------------
>
> Key: HDFS-5497
> URL: https://issues.apache.org/jira/browse/HDFS-5497
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Reporter: Kihwal Lee
>
> Currently UnderReplicatedBlocks uses LightWeightLinkedSet with the default
> initial size of 16. If there are a lot of under-replicated blocks, insertion
> and removal can be very expensive.
> We see 450K to 1M under-replicated block during start-up, which typically go
> away soon as last few data nodes join. With 450K under-replicated blocks,
> replication queue initialization would re-allocate the underlying array 15
> time and reinsert elements over 1M times. As block reports come in, it will
> go through the reverse. I think this one of the reasons why initial block
> reports after leaving safe mode can take very long time to process.
> With a larger initial/minimum size, the timing gets significantly shorter.
--
This message was sent by Atlassian JIRA
(v6.1#6144)