[ 
https://issues.apache.org/jira/browse/HDFS-17003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721921#comment-17721921
 ] 

ASF GitHub Bot commented on HDFS-17003:
---------------------------------------

sodonnel commented on PR #5643:
URL: https://github.com/apache/hadoop/pull/5643#issuecomment-1544680459

   If I understand correctly, for a replicated block, if there are two corrupt 
block the code in InvalidateCorruptReplicas will be called when the block has 
been replicated correctly. At that point there will be 3 good replicas and 2 
corrupt stored in the corruptReplicas map. The code in the above method will 
then iterate over those two and "invalidate" them on the datanodes they are 
stored in.
   
   For EC, the same applies, however we are sending the blockID + index of the 
last reported replica to both DNs. All we store in the corruptReplica map, is 
the block group ID (ie the block ID with the replica index stripped out) and 
then the list of nodes hosting it. At this point in the code we don't know what 
index is on each of the nodes hosting a corrupt replica. Is this correct?
   
   Its not clear to me how the fix in this PR fixes the problem:
   
   ```
           if (blk.isStriped()) {
             DatanodeStorageInfo[] storages = getStorages(blk);
             for (DatanodeStorageInfo storage : storages) {
               final Block b = getBlockOnStorage(blk, storage);
               if (b != null) {
                 reported = b;
               }
             }
           }
   ```
   For each node stored, we get the storages for the block, which will be the 
nodes hosting it. Then we getStorageOnBlock and it is sure to return non-null 
for each of the storages, as they all host a block in the group, right?
   
   Do we not need to somehow find the replica index for the block for each of 
the nodes listed, and then setup the "reported block" with the correct blockID 
+ index for that node, passing that to invalidate?
   
   Would something like this work - NOTE - I have not tested this at all:
   
   ```
   -

> Erasure coding: invalidate wrong block after reporting bad blocks from 
> datanode
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-17003
>                 URL: https://issues.apache.org/jira/browse/HDFS-17003
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: farmmamba
>            Priority: Critical
>              Labels: pull-request-available
>
> After receiving reportBadBlocks RPC from datanode, NameNode compute wrong 
> block to invalidate. It is a dangerous behaviour and may cause data loss. 
> Some logs in our production as below:
>  
> NameNode log:
> {code:java}
> 2023-05-08 21:23:49,112 INFO org.apache.hadoop.hdfs.StateChange: *DIR* 
> reportBadBlocks for block: 
> BP-932824627-xxxx-1680179358678:blk_-9223372036848404320_1471186 on datanode: 
> datanode1:50010
> 2023-05-08 21:23:49,183 INFO org.apache.hadoop.hdfs.StateChange: *DIR* 
> reportBadBlocks for block: 
> BP-932824627-xxxx-1680179358678:blk_-9223372036848404319_1471186 on datanode: 
> datanode2:50010{code}
> datanode1 log:
> {code:java}
> 2023-05-08 21:23:49,088 WARN 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad 
> BP-932824627-xxxx-1680179358678:blk_-9223372036848404320_1471186 on 
> /data7/hadoop/hdfs/datanode
> 2023-05-08 21:24:00,509 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Failed 
> to delete replica blk_-9223372036848404319_1471186: ReplicaInfo not 
> found.{code}
>  
> This phenomenon can be reproduced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to