[ 
https://issues.apache.org/jira/browse/HDFS-17151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752780#comment-17752780
 ] 

ASF GitHub Bot commented on HDFS-17151:
---------------------------------------

zhangshuyan0 opened a new pull request, #5938:
URL: https://github.com/apache/hadoop/pull/5938

   When the datanode completes a block recovery, it will call 
`commitBlockSynchronization` method to notify NN the new locations of the 
block. For a EC block group, NN determines the block index of each storage 
based on its position in the parameter `newtargets`.
   
https://github.com/apache/hadoop/blob/b6edcb9a84ceac340c79cd692637b3e11c997cc5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L4059-L4081
   If the internal blocks written by the client don't have continuous indices, 
the current datanode code might cause NN to record incorrect block metadata. 
   For simplicity, let's take RS (3,2) as an example. The timeline of the 
problem is as follows:
   1. The client plans to write internal blocks with indices [0,1,2,3,4] to 
datanode [dn0, dn1, dn2, dn3, dn4] respectively. But dn1 is unable to connect, 
so the client only writes data to the remaining 4 datanodes;
   2. Client crashes;
   3. NN fails over;
   4. Now the content of `uc. getExpectedStorageLocations()`  in new ANN 
completely depends on block reports, and now it is <dn0, dn2, dn3, dn4>;
   5. When the lease expires hard limit, NN issues a block recovery command;
   6. Datanode that receives the recovery command fills `DatanodeID [] newLocs` 
with [dn0, null, dn2, dn3, dn4];
   
https://github.com/apache/hadoop/blob/b6edcb9a84ceac340c79cd692637b3e11c997cc5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java#L471-L480
   8. The serialization process filters out null values, so the parameters 
passed to NN become [dn0, dn2, dn3, dn4];
   
https://github.com/apache/hadoop/blob/b6edcb9a84ceac340c79cd692637b3e11c997cc5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java#L322-L328
   10. NN mistakenly believes that dn2 stores an internal block with index 1, 
dn3 stores an internal block with index 2, and so on.
   
https://github.com/apache/hadoop/blob/b6edcb9a84ceac340c79cd692637b3e11c997cc5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L4068-L4080
   
   The above timeline is just an example, and there are other situations that 
may result in the same error, such as an update pipeline occurs on the client 
side. We should fix this bug.




> EC: Fix wrong metadata in BlockInfoStriped after recovery
> ---------------------------------------------------------
>
>                 Key: HDFS-17151
>                 URL: https://issues.apache.org/jira/browse/HDFS-17151
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Shuyan Zhang
>            Priority: Major
>
> When the datanode completes a block recovery, it will call 
> `commitBlockSynchronization` method to notify NN the new locations of the 
> block. For a EC block group, NN determines the index of each internal block 
> based on the position of the DatanodeID in the parameter `newtargets`.
> If the internal blocks written by the client don't have continuous indices, 
> the current datanode code might cause NN to record incorrect block metadata. 
> For simplicity, let's take RS (3,2) as an example. The timeline of the 
> problem is as follows:
> 1. The client plans to write internal blocks with indices [0,1,2,3,4] to 
> datanode [dn0, dn1, dn2, dn3, dn4] respectively. But dn1 is unable to 
> connect, so the client only writes data to the remaining 4 datanodes;
> 2. Client crashes;
> 3. NN fails over;
> 4. Now the content of `uc. getExpectedStorageLocations()` completely depends 
> on block reports, and now it is <dn0, dn2, dn3, dn4>;
> 5. When the lease expires hard limit, NN issues a block recovery command;
> 6. Datanode that receives the recovery command fills `DatanodeID [] newLocs` 
> with [dn0, null, dn2, dn3, dn4];
> 7. The serialization process filters out null values, so the parameters 
> passed to NN become [dn0, dn2, dn3, dn4];
> 8. NN mistakenly believes that dn2 stores an internal block with index 1, dn3 
> stores an internal block with index 2, and so on.
> The above timeline is just an example, and there are other situations that 
> may result in the same error, such as an update pipeline occurs on the client 
> side. We should fix this bug.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to