[
https://issues.apache.org/jira/browse/HDFS-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601373#comment-17601373
]
ASF GitHub Bot commented on HDFS-16664:
---------------------------------------
KevinWikant commented on PR #4568:
URL: https://github.com/apache/hadoop/pull/4568#issuecomment-1239579333
## Concrete Reproduce Steps
- Create a Hadoop cluster with:
- single Namenode (i.e. non-HA)
- 5 datanodes (DN1, DN2, DN3, DN4, DN5)
- dfs.namenode.corrupt.block.delete.immediately.enabled = false
- dfs.replication = 3
- Create the block with 3 replicas
```
echo "hello" > /tmp/test;
export HADOOP_USER_NAME=hdfs;
hdfs dfs -put /tmp/test /tmp/test;
hdfs dfs -ls /tmp/test;
```
- Determine the block locations of the 3 replicas
```
> hdfs fsck /tmp/test -files -blocks -locations;
...
0. BP-452161995-NN-1662558403599:blk_1073741825_1001 len=6 Live_repl=3
[DatanodeInfoWithStorage[DN1:9866,DS-XXX,DISK],
DatanodeInfoWithStorage[DN2:9866,DS-XXX,DISK],
DatanodeInfoWithStorage[DN3:9866,DS-XXX,DISK]]
...
```
- Stop DN1 & DN2
```
sudo systemctl disable hadoop-hdfs-datanode.service;
sudo systemctl stop hadoop-hdfs-datanode.service;
```
- Append the block which will cause it to be written to 2 new block locations
```
> hdfs dfs -appendToFile /tmp/test /tmp/test;
2022-09-07 13:49:58,779 INFO hdfs.DataStreamer: Exception in
createBlockOutputStream blk_1073741825_1001
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
at
org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:253)
at
org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1725)
at
org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1507)
at
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1481)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:720)
2022-09-07 13:49:58,783 WARN hdfs.DataStreamer: Error Recovery for
BP-452161995-NN-1662558403599:blk_1073741825_1001 in pipeline
[DatanodeInfoWithStorage[DN1:9866,DS-XXX,DISK],
DatanodeInfoWithStorage[DN2:9866,DS-XXX,DISK],
DatanodeInfoWithStorage[DN3:9866,DS-XXX,DISK]]: datanode
0(DatanodeInfoWithStorage[DN1:9866,DS-XXX,DISK]) is bad.
2022-09-07 13:49:58,808 WARN hdfs.DFSClient: Error transferring data from
DatanodeInfoWithStorage[DN2:9866,DS-XXX,DISK] to
DatanodeInfoWithStorage[DN4:9866,DS-XXX,DISK]: Connection refused
2022-09-07 13:49:58,996 INFO hdfs.DataStreamer: Exception in
createBlockOutputStream blk_1073741825_1001
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
at
org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:253)
at
org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1725)
at
org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1507)
at
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1481)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:720)
2022-09-07 13:49:58,996 WARN hdfs.DataStreamer: Error Recovery for
BP-452161995-NN-1662558403599:blk_1073741825_1001 in pipeline
[DatanodeInfoWithStorage[DN2:9866,DS-XXX,DISK],
DatanodeInfoWithStorage[DN3:9866,DS-XXX,DISK],
DatanodeInfoWithStorage[DN5:9866,DS-XXX,DISK]]: datanode
0(DatanodeInfoWithStorage[DN2:9866,DS-XXX,DISK]) is bad.
> hdfs dfs -cat /tmp/test;
hello
hello
```
- Determine the new block locations of the 3 replicas
```
> hdfs fsck /tmp/test -files -blocks -locations;
...
0. BP-452161995-NN-1662558403599:blk_1073741825_1004 len=12 Live_repl=3
[DatanodeInfoWithStorage[DN3:9866,DS-XXX,DISK],
DatanodeInfoWithStorage[DN4:9866,DS-XXX,DISK],
DatanodeInfoWithStorage[DN5:9866,DS-XXX,DISK]]
...
```
- Restart the Namenode so that the [block replicas are marked as
"stale"](https://code.amazon.com/packages/Aws157HadoopGit/blobs/c89b29bd421152f0e7e16936f18d9e852895c37a/--/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java#L120)
- enable Namenode BlockManager DEBUG logging by setting Log4J
configuration "log4j.logger.BlockStateChange=DEBUG"
```
sudo systemctl restart hadoop-hdfs-namenode.service;
```
- Restart DN1 & DN2
```
sudo systemctl start hadoop-hdfs-datanode.service;
```
- Check the Namenode logs to confirm the block invalidation is postponed
```
2022-09-07 13:50:58,194 DEBUG BlockStateChange (Block report processor):
BLOCK* invalidateBlocks: postponing invalidation of
blk_1073741825_1001(stored=blk_1073741825_1004) on DN2:9866 because 1
replica(s) are located on nodes with potentially out-of-date block reports
2022-09-07 13:51:06,780 DEBUG BlockStateChange (Block report processor):
BLOCK* invalidateBlocks: postponing invalidation of
blk_1073741825_1001(stored=blk_1073741825_1004) on DN1:9866 because 1
replica(s) are located on nodes with potentially out-of-date block reports
```
- Restart any of DN3, DN4, or DN5 so that they send a block report
- note that the [block must be sufficiently replicated for this code path
to
occur](https://github.com/apache/hadoop/blob/a5adc27c99ffa0ee4f274276afd38f0e7367b9d2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L3693)
```
sudo systemctl start hadoop-hdfs-datanode.service;
```
- Check the Namenode logs to validate the invalidation requests were sent to
DN1 & DN2
```
2022-09-07 13:52:07,729 INFO org.apache.hadoop.hdfs.StateChange (IPC Server
handler 26 on default port 8020): BLOCK* registerDatanode: from
DatanodeRegistration(DN4:9866,
datanodeUuid=2792b414-8c97-4a36-bb3c-1bda67ea9f28, infoPort=9864,
infoSecurePort=0, ipcPort=9867,
storageInfo=lv=-57;cid=CID-4227a61e-3071-4e90-a8bd-0ca0ded28a9f;nsid=1332362960;c=1662558403599)
storage 2792b414-8c97-4a36-bb3c-1bda67ea9f28
2022-09-07 13:52:07,729 INFO org.apache.hadoop.net.NetworkTopology (IPC
Server handler 26 on default port 8020): Removing a node: /default-rack/DN4:9866
2022-09-07 13:52:07,730 INFO org.apache.hadoop.net.NetworkTopology (IPC
Server handler 26 on default port 8020): Adding a new node:
/default-rack/DN4:9866
2022-09-07 13:52:07,792 DEBUG BlockStateChange (IPC Server handler 22 on
default port 8020): *BLOCK* NameNode.blockReport: from
DatanodeRegistration(DN4:9866,
datanodeUuid=2792b414-8c97-4a36-bb3c-1bda67ea9f28, infoPort=9864,
infoSecurePort=0, ipcPort=9867,
storageInfo=lv=-57;cid=CID-4227a61e-3071-4e90-a8bd-0ca0ded28a9f;nsid=1332362960;c=1662558403599),
reports.length=2
2022-09-07 13:52:07,793 INFO BlockStateChange (Block report processor):
BLOCK* processReport 0x5b965acbde378e45: Processing first storage report for
DS-XXX from datanode 2792b414-8c97-4a36-bb3c-1bda67ea9f28
2022-09-07 13:52:07,793 DEBUG BlockStateChange (Block report processor):
BLOCK* addStoredBlock: Redundant addStoredBlock request received for
blk_1073741825_1004 on node DN4:9866 size 12
2022-09-07 13:52:07,793 DEBUG BlockStateChange (Block report processor):
BLOCK* invalidateBlock: blk_1073741825_1004(stored=blk_1073741825_1004) on
DN1:9866
2022-09-07 13:52:07,793 DEBUG BlockStateChange (Block report processor):
BLOCK* InvalidateBlocks: add blk_1073741825_1004 to DN1:9866
2022-09-07 13:52:07,793 DEBUG BlockStateChange (Block report processor):
BLOCK* removeStoredBlock: blk_1073741825_1004 from DN1:9866
2022-09-07 13:52:07,795 DEBUG BlockStateChange (Block report processor):
BLOCK* invalidateBlocks: blk_1073741825_1004(stored=blk_1073741825_1004) on
DN1:9866 listed for deletion.
2022-09-07 13:52:07,795 DEBUG BlockStateChange (Block report processor):
BLOCK* invalidateBlock: blk_1073741825_1004(stored=blk_1073741825_1004) on
DN2:9866
2022-09-07 13:52:07,795 DEBUG BlockStateChange (Block report processor):
BLOCK* InvalidateBlocks: add blk_1073741825_1004 to DN2:9866
2022-09-07 13:52:07,795 DEBUG BlockStateChange (Block report processor):
BLOCK* removeStoredBlock: blk_1073741825_1004 from DN2:9866
2022-09-07 13:52:07,795 DEBUG BlockStateChange (Block report processor):
BLOCK* invalidateBlocks: blk_1073741825_1004(stored=blk_1073741825_1004) on
DN2:9866 listed for deletion.
2022-09-07 13:52:07,795 INFO BlockStateChange (Block report processor):
BLOCK* processReport 0x5b965acbde378e45: from storage DS-XXX node
DatanodeRegistration(DN4:9866,
datanodeUuid=2792b414-8c97-4a36-bb3c-1bda67ea9f28, infoPort=9864,
infoSecurePort=0, ipcPort=9867,
storageInfo=lv=-57;cid=CID-4227a61e-3071-4e90-a8bd-0ca0ded28a9f;nsid=1332362960;c=1662558403599),
blocks: 1, hasStaleStorage: false, processing time: 2 msecs,
invalidatedBlocks: 0
2022-09-07 13:52:07,795 INFO BlockStateChange (Block report processor):
BLOCK* processReport 0x5b965acbde378e45: Processing first storage report for
DS-617e1346-8e62-40f0-a35a-5999c3fb2f64 from datanode
2792b414-8c97-4a36-bb3c-1bda67ea9f28
2022-09-07 13:52:07,795 INFO BlockStateChange (Block report processor):
BLOCK* processReport 0x5b965acbde378e45: from storage
DS-617e1346-8e62-40f0-a35a-5999c3fb2f64 node DatanodeRegistration(DN4:9866,
datanodeUuid=2792b414-8c97-4a36-bb3c-1bda67ea9f28, infoPort=9864,
infoSecurePort=0, ipcPort=9867,
storageInfo=lv=-57;cid=CID-4227a61e-3071-4e90-a8bd-0ca0ded28a9f;nsid=1332362960;c=1662558403599),
blocks: 0, hasStaleStorage: false, processing time: 0 msecs,
invalidatedBlocks: 0
2022-09-07 13:52:09,128 DEBUG BlockStateChange (RedundancyMonitor): BLOCK*
neededReconstruction = 0 pendingReconstruction = 0
2022-09-07 13:52:09,128 DEBUG BlockStateChange (RedundancyMonitor): BLOCK*
BlockManager: ask DN2:9866 to delete [blk_1073741825_1004]
2022-09-07 13:52:09,128 DEBUG BlockStateChange (RedundancyMonitor): BLOCK*
BlockManager: ask DN1:9866 to delete [blk_1073741825_1004]
```
- Check the datanode logs (for DN1 & DN2) to validate the "GenerationStamp
not matched" exception occurs
```
2022-09-07 13:50:58,206 INFO org.apache.hadoop.hdfs.server.datanode.DataNode
(BP-452161995-NN-1662558403599 heartbeating to NN/NN:8020): Successfully sent
block report 0x65d28e4f71df62a6, containing 2 storage report(s), of which we
sent 2. The reports had 1 total blocks and used 1 RPC(s). This took 3 msec to
generate and 19 msecs for RPC and NN processing. Got back one command:
FinalizeCommand/5.
2022-09-07 13:50:58,206 INFO org.apache.hadoop.hdfs.server.datanode.DataNode
(BP-452161995-NN-1662558403599 heartbeating to NN/NN:8020): Got finalize
command for block pool BP-452161995-NN-1662558403599
2022-09-07 13:52:10,159 WARN org.apache.hadoop.hdfs.server.datanode.DataNode
(BP-452161995-NN-1662558403599 heartbeating to NN/NN:8020): Error processing
datanode Command
java.io.IOException: Failed to delete 1 (out of 1) replica(s):
0) Failed to delete replica blk_1073741825_1004: GenerationStamp not
matched, existing replica is blk_1073741825_1001
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2135)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2034)
at
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:734)
at
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:680)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:883)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:678)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:849)
at java.lang.Thread.run(Thread.java:750)
```
```
2022-09-07 13:51:06,797 INFO org.apache.hadoop.hdfs.server.datanode.DataNode
(BP-452161995-NN-1662558403599 heartbeating to NN/NN:8020): Successfully sent
block report 0x43c9d07e94b8c90b, containing 2 storage report(s), of which we
sent 2. The reports had 1 total blocks and used 1 RPC(s). This took 4 msec to
generate and 26 msecs for RPC and NN processing. Got back one command:
FinalizeCommand/5.
2022-09-07 13:51:06,797 INFO org.apache.hadoop.hdfs.server.datanode.DataNode
(BP-452161995-NN-1662558403599 heartbeating to NN/NN:8020): Got finalize
command for block pool BP-452161995-NN-1662558403599
2022-09-07 13:52:09,738 WARN org.apache.hadoop.hdfs.server.datanode.DataNode
(BP-452161995-NN-1662558403599 heartbeating to NN/NN:8020): Error processing
datanode Command
java.io.IOException: Failed to delete 1 (out of 1) replica(s):
0) Failed to delete replica blk_1073741825_1004: GenerationStamp not
matched, existing replica is blk_1073741825_1001
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2135)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2034)
at
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:734)
at
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:680)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:883)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:678)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:849)
at java.lang.Thread.run(Thread.java:750)
```
> Use correct GenerationStamp when invalidating corrupt block replicas
> --------------------------------------------------------------------
>
> Key: HDFS-16664
> URL: https://issues.apache.org/jira/browse/HDFS-16664
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Kevin Wikant
> Priority: Major
> Labels: pull-request-available
> Time Spent: 2h 20m
> Remaining Estimate: 0h
>
> While trying to backport HDFS-16064 to an older Hadoop version, the new unit
> test "testDeleteCorruptReplicaForUnderReplicatedBlock" started failing
> unexpectedly.
> Upon deep diving this unit test failure, I identified a bug in HDFS corrupt
> replica invalidation which results in the following datanode exception:
> {quote}2022-07-16 08:07:52,041 [BP-958471676-X-1657973243350 heartbeating to
> localhost/127.0.0.1:61365] WARN datanode.DataNode
> (BPServiceActor.java:processCommand(887)) - Error processing datanode Command
> java.io.IOException: Failed to delete 1 (out of 1) replica(s):
> 0) Failed to delete replica blk_1073741825_1005: GenerationStamp not matched,
> existing replica is blk_1073741825_1001
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2139)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2034)
> at
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:735)
> at
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:680)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:883)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:678)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:849)
> at java.lang.Thread.run(Thread.java:750)
> {quote}
> The issue is that the Namenode is sending wrong generationStamp to the
> datanode. By adding some additional logs, I was able to determine the root
> cause for this:
> * the generationStamp sent in the DNA_INVALIDATE is based on the
> [generationStamp of the block sent in the block
> report|https://github.com/apache/hadoop/blob/8774f178686487007dcf8c418c989b785a529000/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L3733]
> * the problem is that the datanode with the corrupt block replica (that
> receives the DNA_INVALIDATE) is not necissarily the same datanode that sent
> the block report
> * this can cause the above exception when the corrupt block replica on the
> datanode receiving the DNA_INVALIDATE & the block replica on the datanode
> that sent the block report have different generationStamps
> The solution is to store the corrupt replicas generationStamp in the
> CorruptReplicasMap, then to extract this correct generationStamp value when
> sending the DNA_INVALIDATE to the datanode
>
> h2. Failed Test - Before the fix
> {quote}> mvn test
> -Dtest=TestDecommission#testDeleteCorruptReplicaForUnderReplicatedBlock
>
> [INFO] Results:
> [INFO]
> [ERROR] Failures:
> [ERROR]
> TestDecommission.testDeleteCorruptReplicaForUnderReplicatedBlock:2035 Node
> 127.0.0.1:61366 failed to complete decommissioning. numTrackedNodes=1 ,
> numPendingNodes=0 , adminState=Decommission In Progress ,
> nodesWithReplica=[127.0.0.1:61366, 127.0.0.1:61419]
> {quote}
> Logs:
> {quote}> cat
> target/surefire-reports/org.apache.hadoop.hdfs.TestDecommission-output.txt |
> grep 'Expected Replicas:|XXX|FINALIZED|Block now|Failed to delete'
> 2022-07-16 08:07:45,891 [Listener at localhost/61378] INFO
> hdfs.TestDecommission
> (TestDecommission.java:testDeleteCorruptReplicaForUnderReplicatedBlock(1942))
> - Block now has 2 corrupt replicas on [127.0.0.1:61370 , 127.0.0.1:61375] and
> 1 live replica on 127.0.0.1:61366
> 2022-07-16 08:07:45,913 [Listener at localhost/61378] INFO
> hdfs.TestDecommission
> (TestDecommission.java:testDeleteCorruptReplicaForUnderReplicatedBlock(1974))
> - Block now has 2 corrupt replicas on [127.0.0.1:61370 , 127.0.0.1:61375] and
> 1 decommissioning replica on 127.0.0.1:61366
> XXX invalidateBlock dn=127.0.0.1:61415 , blk=1073741825_1001
> XXX postponeBlock dn=127.0.0.1:61415 , blk=1073741825_1001
> XXX invalidateBlock dn=127.0.0.1:61419 , blk=1073741825_1003
> XXX addToInvalidates dn=127.0.0.1:61419 , blk=1073741825_1003
> XXX addBlocksToBeInvalidated dn=127.0.0.1:61419 , blk=1073741825_1003
> XXX rescanPostponedMisreplicatedBlocks blk=1073741825_1005
> XXX DNA_INVALIDATE dn=/127.0.0.1:61419 , blk=1073741825_1003
> XXX invalidate(on DN) dn=/127.0.0.1:61419 , invalidBlk=blk_1073741825_1003 ,
> blkByIdAndGenStamp = FinalizedReplica, blk_1073741825_1003, FINALIZED
> 2022-07-16 08:07:49,084 [BP-958471676-X-1657973243350 heartbeating to
> localhost/127.0.0.1:61365] INFO impl.FsDatasetAsyncDiskService
> (FsDatasetAsyncDiskService.java:deleteAsync(226)) - Scheduling
> blk_1073741825_1003 replica FinalizedReplica, blk_1073741825_1003, FINALIZED
> XXX addBlock dn=127.0.0.1:61419 , blk=1073741825_1005 *<<< block report is
> coming from 127.0.0.1:61419 which has genStamp=1005*
> XXX invalidateCorruptReplicas dn=127.0.0.1:61415 ,
> reported_blk=1073741825_1005 *<<< corrupt replica is on 127.0.0.1:61415
> which is expecting genStamp=1001*
> XXX addToInvalidates dn=127.0.0.1:61415 , blk=1073741825_1005
> 2022-07-16 08:07:49,431 [DatanodeAdminMonitor-0] INFO BlockStateChange
> (DatanodeAdminManager.java:logBlockReplicationInfo(417)) - Block:
> blk_1073741825_1005, Expected Replicas: 2, live replicas: 1, corrupt
> replicas: 0, decommissioned replicas: 0, decommissioning replicas: 1,
> maintenance replicas: 0, live entering maintenance replicas: 0, excess
> replicas: 0, Is Open File: false, Datanodes having this block:
> 127.0.0.1:61366 127.0.0.1:61419 , Current Datanode: 127.0.0.1:61366, Is
> current datanode decommissioning: true, Is current datanode entering
> maintenance: false
> XXX addBlocksToBeInvalidated dn=127.0.0.1:61415 , blk=1073741825_1005 *<<<
> Namenode sends wrong genStamp to 127.0.0.1:61415*
> XXX DNA_INVALIDATE dn=/127.0.0.1:61415 , blk=1073741825_1005
> XXX invalidate(on DN) dn=/127.0.0.1:61415 , invalidBlk=blk_1073741825_1005 ,
> blkByIdAndGenStamp = null
> XXX invalidate(on DN) dn=/127.0.0.1:61415 , invalidBlk=blk_1073741825_1005 ,
> blkById = FinalizedReplica, blk_1073741825_1001, FINALIZED
> 2022-07-16 08:07:52,041 [BP-958471676-X-1657973243350 heartbeating to
> localhost/127.0.0.1:61365] WARN datanode.DataNode
> (BPServiceActor.java:processCommand(887)) - Error processing datanode Command
> java.io.IOException: Failed to delete 1 (out of 1) replica(s):
> 0) Failed to delete replica blk_1073741825_1005: GenerationStamp not matched,
> existing replica is blk_1073741825_1001
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2139)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2034)
> at
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:735)
> at
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:680)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:883)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:678)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:849)
> at java.lang.Thread.run(Thread.java:750)
> 2022-07-16 08:07:52,384 [DataXceiver for client at /127.0.0.1:61434
> [Receiving block BP-958471676-X-1657973243350:blk_1073741825_1005]] INFO
> datanode.DataNode (DataXceiver.java:writeBlock(939)) - opWriteBlock
> BP-958471676-X-1657973243350:blk_1073741825_1005 received exception
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block
> BP-958471676-X-1657973243350:blk_1073741825_1005 already exists in state
> FINALIZED and thus cannot be created.
> 2022-07-16 08:07:52,385 [DataXceiver for client at /127.0.0.1:61434
> [Receiving block BP-958471676-X-1657973243350:blk_1073741825_1005]] INFO
> datanode.DataNode (DataXceiver.java:run(307)) - 127.0.0.1:61415:DataXceiver
> error processing WRITE_BLOCK operation src: /127.0.0.1:61434 dst:
> /127.0.0.1:61415;
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block
> BP-958471676-X-1657973243350:blk_1073741825_1005 already exists in state
> FINALIZED and thus cannot be created.
> 2022-07-16 08:07:54,422 [DatanodeAdminMonitor-0] INFO BlockStateChange
> (DatanodeAdminManager.java:logBlockReplicationInfo(417)) - Block:
> blk_1073741825_1005, Expected Replicas: 2, live replicas: 1, corrupt
> replicas: 0, decommissioned replicas: 0, decommissioning replicas: 1,
> maintenance replicas: 0, live entering maintenance replicas: 0, excess
> replicas: 0, Is Open File: false, Datanodes having this block:
> 127.0.0.1:61366 127.0.0.1:61419 , Current Datanode: 127.0.0.1:61366, Is
> current datanode decommissioning: true, Is current datanode entering
> maintenance: false
> ...
> 2022-07-16 08:08:24,426 [DatanodeAdminMonitor-0] INFO BlockStateChange
> (DatanodeAdminManager.java:logBlockReplicationInfo(417)) - Block:
> blk_1073741825_1005, Expected Replicas: 2, live replicas: 1, corrupt
> replicas: 0, decommissioned replicas: 0, decommissioning replicas: 1,
> maintenance replicas: 0, live entering maintenance replicas: 0, excess
> replicas: 0, Is Open File: false, Datanodes having this block:
> 127.0.0.1:61366 127.0.0.1:61419 , Current Datanode: 127.0.0.1:61366, Is
> current datanode decommissioning: true, Is current datanode entering
> maintenance: false
> {quote}
> Note the inline comments above which illustrate the bug
> h2. Successful Test - After the fix
> {quote}> mvn test
> -Dtest=TestDecommission#testDeleteCorruptReplicaForUnderReplicatedBlock
> [INFO] Results:
> [INFO]
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
> {quote}
> Logs:
> {quote}> cat
> target/surefire-reports/org.apache.hadoop.hdfs.TestDecommission-output.txt |
> grep 'Expected Replicas:|XXX|FINALIZED|Block now|Failed to delete'
> 2022-07-16 07:54:30,648 [Listener at localhost/60376] INFO
> hdfs.TestDecommission
> (TestDecommission.java:testDeleteCorruptReplicaForUnderReplicatedBlock(1942))
> - Block now has 2 corrupt replicas on [127.0.0.1:60364 , 127.0.0.1:60368] and
> 1 live replica on 127.0.0.1:60373
> 2022-07-16 07:54:30,669 [Listener at localhost/60376] INFO
> hdfs.TestDecommission
> (TestDecommission.java:testDeleteCorruptReplicaForUnderReplicatedBlock(1974))
> - Block now has 2 corrupt replicas on [127.0.0.1:60364 , 127.0.0.1:60368] and
> 1 decommissioning replica on 127.0.0.1:60373
> XXX invalidateBlock dn=127.0.0.1:60423 , blk=1073741825_1001
> XXX postponeBlock dn=127.0.0.1:60423 , blk=1073741825_1001
> XXX invalidateBlock dn=127.0.0.1:60427 , blk=1073741825_1003
> XXX addToInvalidates dn=127.0.0.1:60427 , blk=1073741825_1003
> XXX addBlocksToBeInvalidated dn=127.0.0.1:60427 , blk=1073741825_1003
> XXX rescanPostponedMisreplicatedBlocks blk=1073741825_1005
> XXX DNA_INVALIDATE dn=/127.0.0.1:60427 , blk=1073741825_1003
> XXX invalidate(on DN) dn=/127.0.0.1:60427 , invalidBlk=blk_1073741825_1003 ,
> blkByIdAndGenStamp = FinalizedReplica, blk_1073741825_1003, FINALIZED
> 2022-07-16 07:54:32,831 [BP-1469857843-X-1657972447604 heartbeating to
> localhost/127.0.0.1:60363] INFO impl.FsDatasetAsyncDiskService
> (FsDatasetAsyncDiskService.java:deleteAsync(226)) - Scheduling
> blk_1073741825_1003 replica FinalizedReplica, blk_1073741825_1003, FINALIZED
> 2022-07-16 07:54:33,772 [DatanodeAdminMonitor-0] INFO BlockStateChange
> (DatanodeAdminManager.java:logBlockReplicationInfo(417)) - Block:
> blk_1073741825_1005, Expected Replicas: 2, live replicas: 0, corrupt
> replicas: 1, decommissioned replicas: 0, decommissioning replicas: 1,
> maintenance replicas: 0, live entering maintenance replicas: 0, excess
> replicas: 0, Is Open File: false, Datanodes having this block:
> 127.0.0.1:60373 127.0.0.1:60423 , Current Datanode: 127.0.0.1:60373, Is
> current datanode decommissioning: true, Is current datanode entering
> maintenance: false
> XXX addBlock dn=127.0.0.1:60427 , blk=1073741825_1005
> XXX invalidateCorruptReplicas dn=127.0.0.1:60423 ,
> reported_blk=1073741825_1005
> XXX getCorruptReplicaGenerationStamp dn=127.0.0.1:60423 , genStamp=1001
> XXX addToInvalidates dn=127.0.0.1:60423 , blk=1073741825_1001
> XXX addBlocksToBeInvalidated dn=127.0.0.1:60423 , blk=1073741825_1001
> XXX DNA_INVALIDATE dn=/127.0.0.1:60423 , blk=1073741825_1001
> XXX invalidate(on DN) dn=/127.0.0.1:60423 , invalidBlk=blk_1073741825_1001 ,
> blkByIdAndGenStamp = FinalizedReplica, blk_1073741825_1001, FINALIZED
> 2022-07-16 07:54:35,796 [BP-1469857843-X-1657972447604 heartbeating to
> localhost/127.0.0.1:60363] INFO impl.FsDatasetAsyncDiskService
> (FsDatasetAsyncDiskService.java:deleteAsync(226)) - Scheduling
> blk_1073741825_1001 replica FinalizedReplica, blk_1073741825_1001, FINALIZED
> XXX addBlock dn=127.0.0.1:60423 , blk=1073741825_1005
> 2022-07-16 07:54:40,768 [Listener at localhost/60430] INFO
> hdfs.TestDecommission
> (TestDecommission.java:testDeleteCorruptReplicaForUnderReplicatedBlock(2050))
> - Block now has 2 live replicas on [127.0.0.1:60423 , 127.0.0.1:60427] and 1
> decommissioned replica on 127.0.0.1:60373
> {quote}
> Using "getCorruptReplicaGenerationStamp" allows the Namenode to get the
> correct generationStamp for the corrupt block replica
>
> h2. Why does unit test failure not reproduce in Hadoop trunk?
> In trunk the block invalidation does not get [postponed
> here|https://github.com/apache/hadoop/blob/8774f178686487007dcf8c418c989b785a529000/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L1986]
> Instead, both corrupt replicas are invalidated immediately during the first
> block report:
> {quote}> cat
> target/surefire-reports/org.apache.hadoop.hdfs.TestDecommission-output.txt |
> grep XXX
> XXX addToInvalidates dn=127.0.0.1:57051 , blk=1073741825_1001
> XXX addToInvalidates dn=127.0.0.1:57055 , blk=1073741825_1003
> {quote}
> The reason for this difference in behavior is not clear at this point
> h2. Additional Info
> * stack trace for when corrupt replica gets immediately invalidated on
> datanode restart (i.e. first block report)
>
> {quote}java.lang.Thread.getStackTrace(Thread.java:1564)
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addToInvalidates(BlockManager.java:1670)
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.invalidateBlock(BlockManager.java:1850)
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.markBlockAsCorrupt(BlockManager.java:1816)
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processFirstBlockReport(BlockManager.java:2929)
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:2665)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.lambda$blockReport$0(NameNodeRpcServer.java:1588)
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.processQueue(BlockManager.java:5066)
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.run(BlockManager.java:5045)
> {quote}
>
> * stack trace for after corrupt replica is removed from datanode & datanode
> notifies namenode (i.e. incremental block report)
> ** this is where the issue begins because the DN which sends the IBR is not
> the same as the DN with the remaining corrupt replica
> {quote}java.lang.Thread.getStackTrace(Thread.java:1564)
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addToInvalidates(BlockManager.java:1670)
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.invalidateBlock(BlockManager.java:1850)
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.invalidateCorruptReplicas(BlockManager.java:3477)
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:3436)
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:4094)
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:4023)
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:4151)
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:4118)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:4800)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer$1.run(NameNodeRpcServer.java:1643)
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.processQueue(BlockManager.java:5066)
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.run(BlockManager.java:5045)
> {quote}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]