[ https://issues.apache.org/jira/browse/HDFS-16987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715888#comment-17715888 ]
ASF GitHub Bot commented on HDFS-16987: --------------------------------------- ayushtkn commented on PR #5583: URL: https://github.com/apache/hadoop/pull/5583#issuecomment-1520440465 I think it required the replica with older genstamp should also be on the same datanode as with the replica with the newer genstamp. Are you able to reproduce it when the 1001 replicas and 1002 replicas are on different datanodes? Else this block deletes corrupt replica immediately ``` // the block is over-replicated so invalidate the replicas immediately invalidateBlock(b, node, numberOfReplicas); ``` If you debug your test and go inside invalidateBlocks(). ``` // we already checked the number of replicas in the caller of this // function and know there are enough live replicas, so we can delete it. addToInvalidates(b.getCorrupted(), dn); removeStoredBlock(b.getStored(), node); ``` This ``` removeStoredBlock(b.getStored(), node);``` removes ``node`` from the stored block which even contained the 1002 genstamp. Since all three 1001 & 1002 are on 3 same DN. For first 2 ``` boolean minReplicationSatisfied = hasMinStorage(b.getStored(), numUsableReplicas); ``` this stays satisfied, but since the previous 2 removed the actual storage which contained 1002 from the blockMap to get rid of 1001, so in the last iteration, this comes false, and hence the last replica isn't deleted. So, I feel for this to trigger the 1001 & 1002 need to be on same datanode > NameNode should remove all invalid corrupted blocks when starting active > service > -------------------------------------------------------------------------------- > > Key: HDFS-16987 > URL: https://issues.apache.org/jira/browse/HDFS-16987 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: ZanderXu > Assignee: ZanderXu > Priority: Critical > Labels: pull-request-available > > In our prod environment, we encountered an incident where HA failover caused > some new corrupted blocks, causing some jobs to fail. > > Traced down and found a bug in the processing of all pending DN messages when > starting active services. > The steps to reproduce are as follows: > # Suppose NN1 is Active and NN2 is Standby, Active works well and Standby is > unstable > # Timing 1, client create a file, write some data and close it. > # Timing 2, client append this file, write some data and close it. > # Timing 3, Standby replayed the second closing edits of this file > # Timing 4, Standby processes the blockReceivedAndDeleted of the first > create operation > # Timing 5, Standby processed the blockReceivedAndDeleted of the second > append operation > # Timing 6, Admin switched the active namenode from NN1 to NN2 > # Timing 7, client failed to append some data to this file. > {code:java} > org.apache.hadoop.ipc.RemoteException(java.io.IOException): append: > lastBlock=blk_1073741825_1002 of src=/testCorruptedBlockAfterHAFailover is > not sufficiently replicated yet. > at > org.apache.hadoop.hdfs.server.namenode.FSDirAppendOp.appendFile(FSDirAppendOp.java:138) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2992) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:858) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:527) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1221) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1144) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3170) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org