[ 
https://issues.apache.org/jira/browse/HDFS-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-9381:
----------------------------
    Component/s: erasure-coding

> When same block came for replication for Striped mode, we can move that block 
> to PendingReplications
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9381
>                 URL: https://issues.apache.org/jira/browse/HDFS-9381
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding, namenode
>    Affects Versions: 3.0.0
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>
> Currently I noticed that we are just returning null if block already exists 
> in pendingReplications in replication flow for striped blocks.
> {code}
> if (block.isStriped()) {
>       if (pendingNum > 0) {
>         // Wait the previous recovery to finish.
>         return null;
>       }
> {code}
>  Here if neededReplications contains only fewer blocks(basically by default 
> if less than numliveNodes*2), then same blocks can be picked again from 
> neededReplications if we just return null as we are not removing element from 
> neededReplications. Since this replication process need to take fsnamesystmem 
> lock and do, we may spend some time unnecessarily in every loop. 
> So my suggestion/improvement is:
>  Instead of just returning null, how about incrementing pendingReplications 
> for this block and remove from neededReplications? and also another point to 
> consider here is, to add into pendingReplications, generally we need target 
> and it is nothing to which node we issued replication command. Later when 
> after replication success and DN reported it, block will be removed from 
> pendingReplications from NN addBlock. 
>  So since this is newly picked block from neededReplications, we would not 
> have selected target yet. So which target to be passed to pendingReplications 
> if we add this block.. One Option I am thinking is, how about just passing 
> srcNode itself as target for this special condition? So, anyway if block is 
> really missed, srcNode anyway will not report it. So this block will not be 
> removed from pending replications, so that when it timeout, it will be 
> considered for replication and that time it will find actual target to 
> replicate.
>  
>  So  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to