[
https://issues.apache.org/jira/browse/HDFS-17003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721156#comment-17721156
]
farmmamba commented on HDFS-17003:
----------------------------------
[~hexiaoqiao] , Hi, sir. the data loss can be reproduce like below, and main
reason is in the end.
suppose we have d1-d6, r1-r3 of a file test.txt.
1、echo 0 > d1 and echo 0 > d2
2、hdfs dfs -cat test.txt to report bad d1、d2 and reconstruction the d1 to d1',
d2 to d2', here will only invalidate d2 beacause the namenode logic, so we
still have corrupt d1.
3、echo 0 > d1' and echo 0 > d2', then execute hdfs dfs -cat test.txt to
reconstruction d1' to d1'' , d2' to d2''
4、then echo 0 > r1; echo 0 > r2; echo 0 > r3.
5、wait a moment, the file is corrupted and can not be recoverable.
The main reason of this case is that d1 and d1‘ is not deleted in time, and
namenode detects the excess blocks and then deletes the right block d1''.
any other information, we can see code in BlockManager#addStoredBlock method:
{code:java}
if ((corruptReplicasCount > 0) && (numLiveReplicas >= fileRedundancy)) {
invalidateCorruptReplicas(storedBlock, reportedBlock, num);
}{code}
if we destory two data blocks of a EC stripe, then hdfs will reconstruct those
two data blocks and send IBR to namenode. So, it will execute
BlockManager#addStoredBlock method, when receiving the second data block's
IBR, namenode will enter the if condition above. the param we passed here is
reportedBlock. In invalidateCorruptReplicas method, it will add corrupt blocks
to InvalidateBlocks according to the reportedBlock param, so this logic will
ignore invalidating the block who send IBR firstly.
> Erasure coding: invalidate wrong block after reporting bad blocks from
> datanode
> -------------------------------------------------------------------------------
>
> Key: HDFS-17003
> URL: https://issues.apache.org/jira/browse/HDFS-17003
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: farmmamba
> Priority: Critical
>
> After receiving reportBadBlocks RPC from datanode, NameNode compute wrong
> block to invalidate. It is a dangerous behaviour and may cause data loss.
> Some logs in our production as below:
>
> NameNode log:
> {code:java}
> 2023-05-08 21:23:49,112 INFO org.apache.hadoop.hdfs.StateChange: *DIR*
> reportBadBlocks for block:
> BP-932824627-xxxx-1680179358678:blk_-9223372036848404320_1471186 on datanode:
> datanode1:50010
> 2023-05-08 21:23:49,183 INFO org.apache.hadoop.hdfs.StateChange: *DIR*
> reportBadBlocks for block:
> BP-932824627-xxxx-1680179358678:blk_-9223372036848404319_1471186 on datanode:
> datanode2:50010{code}
> datanode1 log:
> {code:java}
> 2023-05-08 21:23:49,088 WARN
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad
> BP-932824627-xxxx-1680179358678:blk_-9223372036848404320_1471186 on
> /data7/hadoop/hdfs/datanode
> 2023-05-08 21:24:00,509 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Failed
> to delete replica blk_-9223372036848404319_1471186: ReplicaInfo not
> found.{code}
>
> This phenomenon can be reproduced.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]