[ 
https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14981734#comment-14981734
 ] 

Yi Liu commented on HDFS-9275:
------------------------------

{code}
+      if (pendingNum > 0) {
+        // Wait the previous recovery to finish.
+        return false;
+      }
{code}
We also needs to do two more things:
1. Also check this in {{scheduleRecovery}} to avoid unnecessary choose targets.
2. move the block group to end of queue of same priority in 
{{neededReplications}}, otherwise it's chosen first again next time.



> Wait previous ErasureCodingWork to finish before schedule another one
> ---------------------------------------------------------------------
>
>                 Key: HDFS-9275
>                 URL: https://issues.apache.org/jira/browse/HDFS-9275
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Walter Su
>            Assignee: Walter Su
>         Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch, 
> HDFS-9275.03.patch, HDFS-9275.04.patch
>
>
> In {{ErasureCodingWorker}}, for the same block group, one task doesn't know 
> which internal blocks is in recovering by other tasks. We could end up with 
> recovering 2 identical block with same index. So, {{ReplicationMonitor}} 
> should wait previous work to finish before schedule another one.
> This is related to the occasional failure of {{TestRecoverStripedFile}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to