KevinWikant opened a new pull request, #4568: URL: https://github.com/apache/hadoop/pull/4568
### Description of PR Under certain conditions the Namenode can send the incorrect generationStamp to a datanode when invalidating a corrupt block replica. - the generationStamp sent in the DNA_INVALIDATE is based on the [generationStamp of the block sent in the block report](https://github.com/apache/hadoop/blob/8774f178686487007dcf8c418c989b785a529000/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L3733) - the problem is that the datanode with the corrupt block replica (that receives the DNA_INVALIDATE) is not necissarily the same datanode that sent the block report - this can cause the above exception when the corrupt block replica on the datanode receiving the DNA_INVALIDATE & the block replica on the datanode that sent the block report have different generationStamps Results in the following datanode exception: ``` 2022-07-16 08:07:52,041 [BP-958471676-X-1657973243350 heartbeating to localhost/127.0.0.1:61365] WARN datanode.DataNode (BPServiceActor.java:processCommand(887)) - Error processing datanode Command java.io.IOException: Failed to delete 1 (out of 1) replica(s): 0) Failed to delete replica blk_1073741825_1005: GenerationStamp not matched, existing replica is blk_1073741825_1001 at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2139) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2034) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:735) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:883) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:678) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:849) at java.lang.Thread.run(Thread.java:750) ``` ### How was this patch tested? Validated the fix by leveraging the unit test "TestDecommission#testDeleteCorruptReplicaForUnderReplicatedBlock" ## Failed Test - Before this change ``` > mvn test -Dtest=TestDecommission#testDeleteCorruptReplicaForUnderReplicatedBlock [INFO] Results: [INFO] [ERROR] Failures: [ERROR] TestDecommission.testDeleteCorruptReplicaForUnderReplicatedBlock:2035 Node 127.0.0.1:61366 failed to complete decommissioning. numTrackedNodes=1 , numPendingNodes=0 , adminState=Decommission In Progress , nodesWithReplica=[127.0.0.1:61366, 127.0.0.1:61419] ``` ``` > cat target/surefire-reports/org.apache.hadoop.hdfs.TestDecommission-output.txt | grep 'Expected Replicas:\|XXX\|FINALIZED\|Block now\|Failed to delete' 2022-07-16 08:07:45,891 [Listener at localhost/61378] INFO hdfs.TestDecommission (TestDecommission.java:testDeleteCorruptReplicaForUnderReplicatedBlock(1942)) - Block now has 2 corrupt replicas on [127.0.0.1:61370 , 127.0.0.1:61375] and 1 live replica on 127.0.0.1:61366 2022-07-16 08:07:45,913 [Listener at localhost/61378] INFO hdfs.TestDecommission (TestDecommission.java:testDeleteCorruptReplicaForUnderReplicatedBlock(1974)) - Block now has 2 corrupt replicas on [127.0.0.1:61370 , 127.0.0.1:61375] and 1 decommissioning replica on 127.0.0.1:61366 XXX invalidateBlock dn=127.0.0.1:61415 , blk=1073741825_1001 XXX postponeBlock dn=127.0.0.1:61415 , blk=1073741825_1001 XXX invalidateBlock dn=127.0.0.1:61419 , blk=1073741825_1003 XXX addToInvalidates dn=127.0.0.1:61419 , blk=1073741825_1003 XXX addBlocksToBeInvalidated dn=127.0.0.1:61419 , blk=1073741825_1003 XXX rescanPostponedMisreplicatedBlocks blk=1073741825_1005 XXX DNA_INVALIDATE dn=/127.0.0.1:61419 , blk=1073741825_1003 XXX invalidate(on DN) dn=/127.0.0.1:61419 , invalidBlk=blk_1073741825_1003 , blkByIdAndGenStamp = FinalizedReplica, blk_1073741825_1003, FINALIZED 2022-07-16 08:07:49,084 [BP-958471676-X-1657973243350 heartbeating to localhost/127.0.0.1:61365] INFO impl.FsDatasetAsyncDiskService (FsDatasetAsyncDiskService.java:deleteAsync(226)) - Scheduling blk_1073741825_1003 replica FinalizedReplica, blk_1073741825_1003, FINALIZED XXX addBlock dn=127.0.0.1:61419 , blk=1073741825_1005 <<< block report is coming from 127.0.0.1:61419 which has genStamp=1005 XXX invalidateCorruptReplicas dn=127.0.0.1:61415 , reported_blk=1073741825_1005 <<< corrupt replica is on 127.0.0.1:61415 which is expecting genStamp=1001 XXX addToInvalidates dn=127.0.0.1:61415 , blk=1073741825_1005 2022-07-16 08:07:49,431 [DatanodeAdminMonitor-0] INFO BlockStateChange (DatanodeAdminManager.java:logBlockReplicationInfo(417)) - Block: blk_1073741825_1005, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 1, maintenance replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is Open File: false, Datanodes having this block: 127.0.0.1:61366 127.0.0.1:61419 , Current Datanode: 127.0.0.1:61366, Is current datanode decommissioning: true, Is current datanode entering maintenance: false XXX addBlocksToBeInvalidated dn=127.0.0.1:61415 , blk=1073741825_1005 <<< Namenode sends wrong genStamp to 127.0.0.1:61415 XXX DNA_INVALIDATE dn=/127.0.0.1:61415 , blk=1073741825_1005 XXX invalidate(on DN) dn=/127.0.0.1:61415 , invalidBlk=blk_1073741825_1005 , blkByIdAndGenStamp = null XXX invalidate(on DN) dn=/127.0.0.1:61415 , invalidBlk=blk_1073741825_1005 , blkById = FinalizedReplica, blk_1073741825_1001, FINALIZED 2022-07-16 08:07:52,041 [BP-958471676-X-1657973243350 heartbeating to localhost/127.0.0.1:61365] WARN datanode.DataNode (BPServiceActor.java:processCommand(887)) - Error processing datanode Command java.io.IOException: Failed to delete 1 (out of 1) replica(s): 0) Failed to delete replica blk_1073741825_1005: GenerationStamp not matched, existing replica is blk_1073741825_1001 at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2139) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2034) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:735) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:883) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:678) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:849) at java.lang.Thread.run(Thread.java:750) 2022-07-16 08:07:52,384 [DataXceiver for client at /127.0.0.1:61434 [Receiving block BP-958471676-X-1657973243350:blk_1073741825_1005]] INFO datanode.DataNode (DataXceiver.java:writeBlock(939)) - opWriteBlock BP-958471676-X-1657973243350:blk_1073741825_1005 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-958471676-X-1657973243350:blk_1073741825_1005 already exists in state FINALIZED and thus cannot be created. 2022-07-16 08:07:52,385 [DataXceiver for client at /127.0.0.1:61434 [Receiving block BP-958471676-X-1657973243350:blk_1073741825_1005]] INFO datanode.DataNode (DataXceiver.java:run(307)) - 127.0.0.1:61415:DataXceiver error processing WRITE_BLOCK operation src: /127.0.0.1:61434 dst: /127.0.0.1:61415; org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-958471676-X-1657973243350:blk_1073741825_1005 already exists in state FINALIZED and thus cannot be created. 2022-07-16 08:07:54,422 [DatanodeAdminMonitor-0] INFO BlockStateChange (DatanodeAdminManager.java:logBlockReplicationInfo(417)) - Block: blk_1073741825_1005, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 1, maintenance replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is Open File: false, Datanodes having this block: 127.0.0.1:61366 127.0.0.1:61419 , Current Datanode: 127.0.0.1:61366, Is current datanode decommissioning: true, Is current datanode entering maintenance: false ... 2022-07-16 08:08:24,426 [DatanodeAdminMonitor-0] INFO BlockStateChange (DatanodeAdminManager.java:logBlockReplicationInfo(417)) - Block: blk_1073741825_1005, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 1, maintenance replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is Open File: false, Datanodes having this block: 127.0.0.1:61366 127.0.0.1:61419 , Current Datanode: 127.0.0.1:61366, Is current datanode decommissioning: true, Is current datanode entering maintenance: false ``` Note the inline comments above which illustrate the bug ## Successful Test - After this change ``` > mvn test -Dtest=TestDecommission#testDeleteCorruptReplicaForUnderReplicatedBlock [INFO] Results: [INFO] [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0 ``` Logs: ``` > cat target/surefire-reports/org.apache.hadoop.hdfs.TestDecommission-output.txt | grep 'Expected Replicas:\|XXX\|FINALIZED\|Block now\|Failed to delete' 2022-07-16 07:54:30,648 [Listener at localhost/60376] INFO hdfs.TestDecommission (TestDecommission.java:testDeleteCorruptReplicaForUnderReplicatedBlock(1942)) - Block now has 2 corrupt replicas on [127.0.0.1:60364 , 127.0.0.1:60368] and 1 live replica on 127.0.0.1:60373 2022-07-16 07:54:30,669 [Listener at localhost/60376] INFO hdfs.TestDecommission (TestDecommission.java:testDeleteCorruptReplicaForUnderReplicatedBlock(1974)) - Block now has 2 corrupt replicas on [127.0.0.1:60364 , 127.0.0.1:60368] and 1 decommissioning replica on 127.0.0.1:60373 XXX invalidateBlock dn=127.0.0.1:60423 , blk=1073741825_1001 XXX postponeBlock dn=127.0.0.1:60423 , blk=1073741825_1001 XXX invalidateBlock dn=127.0.0.1:60427 , blk=1073741825_1003 XXX addToInvalidates dn=127.0.0.1:60427 , blk=1073741825_1003 XXX addBlocksToBeInvalidated dn=127.0.0.1:60427 , blk=1073741825_1003 XXX rescanPostponedMisreplicatedBlocks blk=1073741825_1005 XXX DNA_INVALIDATE dn=/127.0.0.1:60427 , blk=1073741825_1003 XXX invalidate(on DN) dn=/127.0.0.1:60427 , invalidBlk=blk_1073741825_1003 , blkByIdAndGenStamp = FinalizedReplica, blk_1073741825_1003, FINALIZED 2022-07-16 07:54:32,831 [BP-1469857843-X-1657972447604 heartbeating to localhost/127.0.0.1:60363] INFO impl.FsDatasetAsyncDiskService (FsDatasetAsyncDiskService.java:deleteAsync(226)) - Scheduling blk_1073741825_1003 replica FinalizedReplica, blk_1073741825_1003, FINALIZED 2022-07-16 07:54:33,772 [DatanodeAdminMonitor-0] INFO BlockStateChange (DatanodeAdminManager.java:logBlockReplicationInfo(417)) - Block: blk_1073741825_1005, Expected Replicas: 2, live replicas: 0, corrupt replicas: 1, decommissioned replicas: 0, decommissioning replicas: 1, maintenance replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is Open File: false, Datanodes having this block: 127.0.0.1:60373 127.0.0.1:60423 , Current Datanode: 127.0.0.1:60373, Is current datanode decommissioning: true, Is current datanode entering maintenance: false XXX addBlock dn=127.0.0.1:60427 , blk=1073741825_1005 XXX invalidateCorruptReplicas dn=127.0.0.1:60423 , reported_blk=1073741825_1005 XXX getCorruptReplicaGenerationStamp dn=127.0.0.1:60423 , genStamp=1001 XXX addToInvalidates dn=127.0.0.1:60423 , blk=1073741825_1001 XXX addBlocksToBeInvalidated dn=127.0.0.1:60423 , blk=1073741825_1001 XXX DNA_INVALIDATE dn=/127.0.0.1:60423 , blk=1073741825_1001 XXX invalidate(on DN) dn=/127.0.0.1:60423 , invalidBlk=blk_1073741825_1001 , blkByIdAndGenStamp = FinalizedReplica, blk_1073741825_1001, FINALIZED 2022-07-16 07:54:35,796 [BP-1469857843-X-1657972447604 heartbeating to localhost/127.0.0.1:60363] INFO impl.FsDatasetAsyncDiskService (FsDatasetAsyncDiskService.java:deleteAsync(226)) - Scheduling blk_1073741825_1001 replica FinalizedReplica, blk_1073741825_1001, FINALIZED XXX addBlock dn=127.0.0.1:60423 , blk=1073741825_1005 2022-07-16 07:54:40,768 [Listener at localhost/60430] INFO hdfs.TestDecommission (TestDecommission.java:testDeleteCorruptReplicaForUnderReplicatedBlock(2050)) - Block now has 2 live replicas on [127.0.0.1:60423 , 127.0.0.1:60427] and 1 decommissioned replica on 127.0.0.1:60373 ``` Using "getCorruptReplicaGenerationStamp" allows the Namenode to get the correct generationStamp for the corrupt block replica ### For code changes: - [X] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [n/a] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [n/a] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [n/a] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
