[
https://issues.apache.org/jira/browse/HDFS-17151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752780#comment-17752780
]
ASF GitHub Bot commented on HDFS-17151:
---------------------------------------
zhangshuyan0 opened a new pull request, #5938:
URL: https://github.com/apache/hadoop/pull/5938
When the datanode completes a block recovery, it will call
`commitBlockSynchronization` method to notify NN the new locations of the
block. For a EC block group, NN determines the block index of each storage
based on its position in the parameter `newtargets`.
https://github.com/apache/hadoop/blob/b6edcb9a84ceac340c79cd692637b3e11c997cc5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L4059-L4081
If the internal blocks written by the client don't have continuous indices,
the current datanode code might cause NN to record incorrect block metadata.
For simplicity, let's take RS (3,2) as an example. The timeline of the
problem is as follows:
1. The client plans to write internal blocks with indices [0,1,2,3,4] to
datanode [dn0, dn1, dn2, dn3, dn4] respectively. But dn1 is unable to connect,
so the client only writes data to the remaining 4 datanodes;
2. Client crashes;
3. NN fails over;
4. Now the content of `uc. getExpectedStorageLocations()` in new ANN
completely depends on block reports, and now it is <dn0, dn2, dn3, dn4>;
5. When the lease expires hard limit, NN issues a block recovery command;
6. Datanode that receives the recovery command fills `DatanodeID [] newLocs`
with [dn0, null, dn2, dn3, dn4];
https://github.com/apache/hadoop/blob/b6edcb9a84ceac340c79cd692637b3e11c997cc5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java#L471-L480
8. The serialization process filters out null values, so the parameters
passed to NN become [dn0, dn2, dn3, dn4];
https://github.com/apache/hadoop/blob/b6edcb9a84ceac340c79cd692637b3e11c997cc5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java#L322-L328
10. NN mistakenly believes that dn2 stores an internal block with index 1,
dn3 stores an internal block with index 2, and so on.
https://github.com/apache/hadoop/blob/b6edcb9a84ceac340c79cd692637b3e11c997cc5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L4068-L4080
The above timeline is just an example, and there are other situations that
may result in the same error, such as an update pipeline occurs on the client
side. We should fix this bug.
> EC: Fix wrong metadata in BlockInfoStriped after recovery
> ---------------------------------------------------------
>
> Key: HDFS-17151
> URL: https://issues.apache.org/jira/browse/HDFS-17151
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Shuyan Zhang
> Priority: Major
>
> When the datanode completes a block recovery, it will call
> `commitBlockSynchronization` method to notify NN the new locations of the
> block. For a EC block group, NN determines the index of each internal block
> based on the position of the DatanodeID in the parameter `newtargets`.
> If the internal blocks written by the client don't have continuous indices,
> the current datanode code might cause NN to record incorrect block metadata.
> For simplicity, let's take RS (3,2) as an example. The timeline of the
> problem is as follows:
> 1. The client plans to write internal blocks with indices [0,1,2,3,4] to
> datanode [dn0, dn1, dn2, dn3, dn4] respectively. But dn1 is unable to
> connect, so the client only writes data to the remaining 4 datanodes;
> 2. Client crashes;
> 3. NN fails over;
> 4. Now the content of `uc. getExpectedStorageLocations()` completely depends
> on block reports, and now it is <dn0, dn2, dn3, dn4>;
> 5. When the lease expires hard limit, NN issues a block recovery command;
> 6. Datanode that receives the recovery command fills `DatanodeID [] newLocs`
> with [dn0, null, dn2, dn3, dn4];
> 7. The serialization process filters out null values, so the parameters
> passed to NN become [dn0, dn2, dn3, dn4];
> 8. NN mistakenly believes that dn2 stores an internal block with index 1, dn3
> stores an internal block with index 2, and so on.
> The above timeline is just an example, and there are other situations that
> may result in the same error, such as an update pipeline occurs on the client
> side. We should fix this bug.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]