[ 
https://issues.apache.org/jira/browse/HDFS-5497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819359#comment-13819359
 ] 

Kihwal Lee commented on HDFS-5497:
----------------------------------

I think it makes sense to increase the default minimum size to avoid repeated 
array growing and shrinking with small number of under-replicated blocks.  In 
addition to this, I propose making the minimum size configurable and also 
disabling the array growth/shrinkage an option for users with big name space in 
their cluster.

> Performance may suffer when UnderReplicatedBlocks is used heavily
> -----------------------------------------------------------------
>
>                 Key: HDFS-5497
>                 URL: https://issues.apache.org/jira/browse/HDFS-5497
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Kihwal Lee
>
> Currently UnderReplicatedBlocks uses LightWeightLinkedSet with the default 
> initial size of 16.  If there are a lot of under-replicated blocks, insertion 
> and removal can be very expensive.
> We see 450K to 1M under-replicated block during start-up, which typically go 
> away soon as last few data nodes join. With 450K under-replicated blocks, 
> replication queue initialization would re-allocate the underlying array 15 
> time and reinsert elements over 1M times.  As block reports come in, it will 
> go through the reverse.  I think this one of the reasons why initial block 
> reports after leaving safe mode can take very long time to process.
> With a larger initial/minimum size, the timing gets significantly shorter. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to