[ https://issues.apache.org/jira/browse/HDFS-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802416#comment-17802416 ]
Shilun Fan commented on HDFS-15357: ----------------------------------- updated the target version for preparing 3.4.0 release. > Do not trust bad block reports from clients > ------------------------------------------- > > Key: HDFS-15357 > URL: https://issues.apache.org/jira/browse/HDFS-15357 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Kihwal Lee > Priority: Major > > {{reportBadBlocks()}} is implemented by both ClientNamenodeProtocol and > DatanodeProtocol. When DFSClient is calling it, a faulty client can cause > data availability issues in a cluster. > In the past we had such an incident where a node with a faulty NIC was > randomly corrupting data. All clients ran on the machine reported all > accessed blocks and all associated replicas to be corrupt. More recently, a > single faulty client process caused a small number of missing blocks. In > all cases, actual data was fine. > The bad block reports from clients shouldn't be trusted blindly. Instead, the > namenode should send a datanode command to verify the claim. A bonus would be > to keep the record for a while and ignore repeated reports from the same > nodes. > At minimum, there should be an option to ignore bad block reports from > clients, perhaps after logging it. A very crude way would be to make it short > out in {{ClientNamenodeProtocolServerSideTranslatorPB#reportBadBlocks()}}. > More sophisticated way would be to check for the datanode user name in > {{FSNamesystem#reportBadBlocks()}} so that it can be easily logged, or > optionally do further processing. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org