[ 
https://issues.apache.org/jira/browse/HDFS-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035185#comment-15035185
 ] 

Uma Maheswara Rao G commented on HDFS-9381:
-------------------------------------------

Thanks a lot for the discussions Zhe, Jing and Walter.

Thanks Walter for the nice comments on the patch. Attached the patch to address 
them
{quote}
 1. need to lock on readyForReplications.
+          if (unscheduledPendingReplications.remove(block)) {
+            readyForReplications.add(block);
+          }
{quote}
Good catch. Done

{quote}
2. Here We can have some log.
       if (pendingNum > 0) {
         // Wait the previous recovery to finish.
+        pendingReplications.addToUnscheduledPendingReplication(block);
+        neededReplications.remove(block, priority);
         return null;
{quote}
Added log.
{quote}
3. Here need to check if block is already in readyForReplications. Otherwise 
it's possible the block appears in readyForReplications, and re-processed.
addToUnscheduledPendingReplication(BlockInfo block) {
{quote}
Yes. Doing this check make sense and more safe.

{quote}
Speaking of the case Zhe Zhang mentioned above. Assume DN_1 has 1m blocks 
totally. So it takes 5000 iter to process all, which means about 4 hrs. If DN_2 
fails soon after DN_1, only neededReplications updated. If DN_2 fails long 
after DN_1, the previous task already finished so we schedule a new task.
{quote}
I think when DN2 failed, priority of block should higher than previous entry. 
So, as it gives respect to priority it should pick from higher Q first and then 
most likely it may picked already scheduled block. May not be all, but at least 
some block.
{quote}
 If so, how should we calculate the saving in locking time?
{quote}
Let me check if I can get in some way to measure this, but it should be tricky 
though. 





> When same block came for replication for Striped mode, we can move that block 
> to PendingReplications
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9381
>                 URL: https://issues.apache.org/jira/browse/HDFS-9381
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding, namenode
>    Affects Versions: 3.0.0
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>         Attachments: HDFS-9381-02.patch, HDFS-9381-03.patch, 
> HDFS-9381-04.patch, HDFS-9381.00.patch, HDFS-9381.01.patch
>
>
> Currently I noticed that we are just returning null if block already exists 
> in pendingReplications in replication flow for striped blocks.
> {code}
> if (block.isStriped()) {
>       if (pendingNum > 0) {
>         // Wait the previous recovery to finish.
>         return null;
>       }
> {code}
>  Here if we just return null and if neededReplications contains only fewer 
> blocks(basically by default if less than numliveNodes*2), then same blocks 
> can be picked again from neededReplications from next loop as we are not 
> removing element from neededReplications. Since this replication process need 
> to take fsnamesystmem lock and do, we may spend some time unnecessarily in 
> every loop. 
> So my suggestion/improvement is:
>  Instead of just returning null, how about incrementing pendingReplications 
> for this block and remove from neededReplications? and also another point to 
> consider here is, to add into pendingReplications, generally we need target 
> and it is nothing but to which node we issued replication command. Later when 
> after replication success and DN reported it, block will be removed from 
> pendingReplications from NN addBlock. 
>  So since this is newly picked block from neededReplications, we would not 
> have selected target yet. So which target to be passed to pendingReplications 
> if we add this block? One Option I am thinking is, how about just passing 
> srcNode itself as target for this special condition? So, anyway if the block 
> is really missed, srcNode will not report it. So this block will not be 
> removed from pending replications, so that when it is timed out, it will be 
> considered for replication again and that time it will find actual target to 
> replicate while processing as part of regular replication flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to