[ 
https://issues.apache.org/jira/browse/HDFS-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-1391:
-----------------------------------

    Attachment: excessReplicas2.txt

Patch can also be reviewed at https://reviews.apache.org/r/196/

Merged patch with latest trunk.

At time of exiting safemode, we walk through all the blocks and if a block has 
excess replicas we insert into overReplicatedBlocks (we do not delete excess 
replicas right then and there). Then we exit safemode. Then the 
ReplicationMonitor thread asynchronously process each of the blocks in the 
overReplicatedBlocks data structure and determines and deletes excess replicas. 
The chooseExcessReplicas method (which can be compute heavy at times) is now is 
called without the FSNamesystem lock.

For a cluster with around 110 million blocks, the "bin/hadoop dfsadmin 
-safemode leave" command used to take about 9 minutes before this patch. With 
this patch, it takes about 55 seconds!

> Exiting safemode takes a long time when there are lots of blocks in the HDFS
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-1391
>                 URL: https://issues.apache.org/jira/browse/HDFS-1391
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: excessReplicas.1_trunk.txt, excessReplicas2.txt
>
>
> When the namenode decides to exit safemode,  it acquires the FSNamesystem 
> lock and then iterates over all blocks in the blocksmap to determine if any 
> block has any excess replicas. This call takes upwards of 5 minutes on a 
> cluster that has 100 million blocks. This delays namenode restart to a good 
> extent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to