[ 
https://issues.apache.org/jira/browse/HADOOP-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HADOOP-940:
------------------------------------

    Attachment: pendingReplication.patch

First version of code. Review comments welcome.

A new thread that records all replications that are currently in progress. If a 
replication request does not complete in 10 minutes, then the block is put back 
in neededReplication.

> pendingReplications of FSNamesystem is not informative
> ------------------------------------------------------
>
>                 Key: HADOOP-940
>                 URL: https://issues.apache.org/jira/browse/HADOOP-940
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.10.1
>            Reporter: Hairong Kuang
>         Attachments: pendingReplication.patch
>
>
> Currently when a neededReplication block is scheduled to be replicated, it is 
> put to the pendingReplications queue. When it is no longer under replicated, 
> it is pulled out of the pendingReplications queue. But the queue does not 
> provide any information like how many targets have been choosen or who those 
> targets are. PendingReplications are not used when deciding if a block is 
> under replication. This may cause a block to be over replications or 
> inaccurate estimate of its replication priority.
> For example, when a block has 1 replicas but it's replication factor is 2, a 
> data node is choosen to replicate this block and the block is put in the 
> pendingReplications queue. If the block's replication factor is changed to be 
> 3 before the block replication notification, which is the next block report, 
> comes in, the block will be put into neededReplictions queue again under the 
> assumption that it needs to choose 2 targets instead of 1. So the block will 
> end up with 4 replicas.
> I propose that we change pendingReplications to be a map from a block to the 
> choosen data nodes. Data nodes in both pendingReplications and blockMap are 
> used when deciding the total number of replicas that a block has. When the 
> name node is notified that the block is replicated in a choosen data node, 
> the data node is moved from pendingReplications to blockMap.
> Each choosen target is also associated with a timer indicating how long it 
> expects to receive the block replication notification. PendingReplications 
> queue needs to be periodically scanned to remove those data nodes whose timer 
> is expired.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to