Kihwal Lee created HDFS-5497:
--------------------------------
Summary: Performance may suffer when UnderReplicatedBlocks is used
heavily
Key: HDFS-5497
URL: https://issues.apache.org/jira/browse/HDFS-5497
Project: Hadoop HDFS
Issue Type: Improvement
Components: namenode
Reporter: Kihwal Lee
Currently UnderReplicatedBlocks uses LightWeightLinkedSet with the default
initial size of 16. If there are a lot of under-replicated blocks, insertion
and removal can be very expensive.
We see 450K to 1M under-replicated block during start-up, which typically go
away soon as last few data nodes join. With 450K under-replicated blocks,
replication queue initialization would re-allocate the underlying array 15 time
and reinsert elements over 1M times. As block reports come in, it will go
through the reverse. I think this one of the reasons why initial block reports
after leaving safe mode can take very long time to process.
With a larger initial/minimum size, the timing gets significantly shorter.
--
This message was sent by Atlassian JIRA
(v6.1#6144)