[
https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14981734#comment-14981734
]
Yi Liu commented on HDFS-9275:
------------------------------
{code}
+ if (pendingNum > 0) {
+ // Wait the previous recovery to finish.
+ return false;
+ }
{code}
We also needs to do two more things:
1. Also check this in {{scheduleRecovery}} to avoid unnecessary choose targets.
2. move the block group to end of queue of same priority in
{{neededReplications}}, otherwise it's chosen first again next time.
> Wait previous ErasureCodingWork to finish before schedule another one
> ---------------------------------------------------------------------
>
> Key: HDFS-9275
> URL: https://issues.apache.org/jira/browse/HDFS-9275
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Walter Su
> Assignee: Walter Su
> Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch,
> HDFS-9275.03.patch, HDFS-9275.04.patch
>
>
> In {{ErasureCodingWorker}}, for the same block group, one task doesn't know
> which internal blocks is in recovering by other tasks. We could end up with
> recovering 2 identical block with same index. So, {{ReplicationMonitor}}
> should wait previous work to finish before schedule another one.
> This is related to the occasional failure of {{TestRecoverStripedFile}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)