[
https://issues.apache.org/jira/browse/HDFS-17094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shuyan Zhang updated HDFS-17094:
--------------------------------
Description: When a block recovery occurs, `RecoveryTaskStriped` in
datanode expects `rBlock.getLocations()` and `rBlock. getBlockIndices()` to be
in one-to-one correspondence. However, if there are locations in stale state
when NameNode handles heartbeat, this correspondence will be disrupted. In
detail, there is no stale location in `recoveryLocations`, but the block
indices array is still complete (i.e. contains the indices of all the
locations). This will cause `BlockRecoveryWorker.RecoveryTaskStriped#recover`
to generate a wrong internal block ID, and the corresponding datanode cannot
find the replica, thus making the recovery process fail. This bug needs to be
fixed. (was: When a block recovery occurs, `RecoveryTaskStriped` in datanode
expects `rBlock.getLocations()` and `rBlock. getBlockIndices()` to be in
one-to-one correspondence. However, if there are locations in stale state when
NameNode handles heartbeat, this correspondence will be disrupted. In detail,
there is no stale location in `recoveryLocations`, but the block indices array
is still complete (i.e. contains the indices of all the locations). This will
cause `BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate a wrong
internal block ID, and the corresponding datanode cannot find the relica, thus
making the recovery process fail. This bug needs to be fixed.)
> EC: Fix bug in block recovery when there are stale datanodes
> ------------------------------------------------------------
>
> Key: HDFS-17094
> URL: https://issues.apache.org/jira/browse/HDFS-17094
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Shuyan Zhang
> Priority: Major
> Labels: pull-request-available
>
> When a block recovery occurs, `RecoveryTaskStriped` in datanode expects
> `rBlock.getLocations()` and `rBlock. getBlockIndices()` to be in one-to-one
> correspondence. However, if there are locations in stale state when NameNode
> handles heartbeat, this correspondence will be disrupted. In detail, there is
> no stale location in `recoveryLocations`, but the block indices array is
> still complete (i.e. contains the indices of all the locations). This will
> cause `BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate a wrong
> internal block ID, and the corresponding datanode cannot find the replica,
> thus making the recovery process fail. This bug needs to be fixed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]