[
https://issues.apache.org/jira/browse/HDFS-17003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
farmmamba updated HDFS-17003:
-----------------------------
Description:
After receiving reportBadBlocks RPC from datanode, NameNode compute wrong block
to invalidate. It is a dangerous behaviour and may cause data loss. Some logs
in our production as below:
NameNode log:
{code:java}
2023-05-08 21:23:49,112 INFO org.apache.hadoop.hdfs.StateChange: *DIR*
reportBadBlocks for block:
BP-932824627-xxxx-1680179358678:blk_-9223372036848404320_1471186 on datanode:
datanode1:50010
2023-05-08 21:23:49,183 INFO org.apache.hadoop.hdfs.StateChange: *DIR*
reportBadBlocks for block:
BP-932824627-xxxx-1680179358678:blk_-9223372036848404319_1471186 on datanode:
datanode2:50010{code}
datanode1 log:
{code:java}
2023-05-08 21:23:49,088 WARN
org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad
BP-932824627-xxxx-1680179358678:blk_-9223372036848404320_1471186 on
/data7/hadoop/hdfs/datanode
2023-05-08 21:24:00,509 INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Failed to
delete replica blk_-9223372036848404319_1471186: ReplicaInfo not found.{code}
This phenomenon can be reproduced.
was:
After receiving reportBadBlocks RPC from datanode, NameNode compute wrong block
to invalidate. It is a dangerous behaviour and may cause data loss. Some logs
in our production as below:
NameNode log:
{code:java}
2023-05-08 14:39:42,241 INFO org.apache.hadoop.hdfs.StateChange: *DIR*
reportBadBlocks for block:
BP-932824627-xxxx-1680179358678:blk_-9223372036846808880_1669008 on datanode:
datanode1:50010 {code}
datanode1 log:
{code:java}
2023-05-08 14:39:42,183 WARN
org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad
BP-932824627-xxxx-1680179358678:blk_-9223372036846808880_1669008
on /data1/hadoop/hdfs/datanode
2023-05-08 14:39:47,338 INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Failed to
delete replica blk_-9223372036846808879_1669008: ReplicaInfo
not found. {code}
This phenomenon can be reproduced.
> Erasure coding: invalidate wrong block after reporting bad blocks from
> datanode
> -------------------------------------------------------------------------------
>
> Key: HDFS-17003
> URL: https://issues.apache.org/jira/browse/HDFS-17003
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: farmmamba
> Priority: Critical
>
> After receiving reportBadBlocks RPC from datanode, NameNode compute wrong
> block to invalidate. It is a dangerous behaviour and may cause data loss.
> Some logs in our production as below:
>
> NameNode log:
> {code:java}
> 2023-05-08 21:23:49,112 INFO org.apache.hadoop.hdfs.StateChange: *DIR*
> reportBadBlocks for block:
> BP-932824627-xxxx-1680179358678:blk_-9223372036848404320_1471186 on datanode:
> datanode1:50010
> 2023-05-08 21:23:49,183 INFO org.apache.hadoop.hdfs.StateChange: *DIR*
> reportBadBlocks for block:
> BP-932824627-xxxx-1680179358678:blk_-9223372036848404319_1471186 on datanode:
> datanode2:50010{code}
> datanode1 log:
> {code:java}
> 2023-05-08 21:23:49,088 WARN
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad
> BP-932824627-xxxx-1680179358678:blk_-9223372036848404320_1471186 on
> /data7/hadoop/hdfs/datanode
> 2023-05-08 21:24:00,509 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Failed
> to delete replica blk_-9223372036848404319_1471186: ReplicaInfo not
> found.{code}
>
> This phenomenon can be reproduced.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]