Shuyan Zhang created HDFS-17094: ----------------------------------- Summary: EC: Fix bug in block recovery when there are stale datanodes Key: HDFS-17094 URL: https://issues.apache.org/jira/browse/HDFS-17094 Project: Hadoop HDFS Issue Type: Bug Reporter: Shuyan Zhang
When a block recovery occurs, `RecoveryTaskStriped` in datanode expects `rBlock.getLocations()` and `rBlock. getBlockIndices()` to be in one-to-one correspondence. However, if there are locations in stale state when NameNode handles heartbeat, this correspondence will be disrupted. In detail, there is no stale location in `recoveryLocations`, but the block indices array is still complete (i.e. contains the indices of all the locations). This will cause `BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate a wrong internal block ID, and the corresponding datanode cannot find the relica, thus making the recovery process fail. This bug needs to be fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org