[jira] [Comment Edited] (HDFS-17658) HDFS decommissioning does not consider if Under Construction blocks are sufficiently replicated which causes HDFS Data Loss

Kevin Wikant (Jira) Tue, 05 Nov 2024 11:50:25 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-17658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17895780#comment-17895780
 ]


Kevin Wikant edited comment on HDFS-17658 at 11/5/24 7:49 PM:
--------------------------------------------------------------

h2. Test#4: Create Block & Repeatedly Append -> Close DFSOutputStream -> 
Re-open Block & Repeatedly Append -> Close DFSOutputStream -> Decommission 
Datanode (dfs.replication = 1)
 * Create a new HDFS cluster (with 0 blocks to begin with)
 * Run the test script to execute the desired test case

 
{quote}2024-11-05 17:16:32 Append Success  100 KB
2024-11-05 17:16:32 Append Success  200 KB
2024-11-05 17:16:32 Append Success  300 KB
2024-11-05 17:16:33 Append Success  400 KB
2024-11-05 17:16:33 Append Success  500 KB
2024-11-05 17:16:33 Append Success  600 KB
{quote}
* Check the namenode logs to identify which datanode(s) have the block 
replica(s)

 

 
{quote}2024-11-05 17:16:32,613 INFO hdfs.StateChange: BLOCK* allocate 
blk_1073741825_1001, replicas=172.31.88.149:9866 for 
/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654
{quote}
* Start decommissioning all the datanode(s) with block replica(s)

 
 * Check the namenode logs to check if the block replica(s) are considered by 
the DatanodeAdminManager & are replicated to other datanode(s) before the 
containing datanode(s) become decommissioned

 
{quote}2024-11-05 17:17:37,773 INFO blockmanagement.DatanodeAdminManager: 
Starting decommission of 172.31.88.149:9866 
[DISK]DS-7b8aa0df-3522-4108-a553-37946b23eadb:NORMAL:172.31.88.149:9866 with 1 
blocks
2024-11-05 17:17:37,773 INFO blockmanagement.DatanodeAdminManager: Starting 
decommission of 172.31.88.149:9866 
[DISK]DS-43e2de52-232a-4437-a008-ad7898c816d7:NORMAL:172.31.88.149:9866 with 0 
blocks
2024-11-05 17:18:07,749 INFO blockmanagement.DatanodeAdminDefaultMonitor: 
Remove any stale pending Node same as added node 172.31.88.149:9866 with UUID 
20ac1984-5bf0-40f1-9488-fbd23c1e7cff which is in state Decommission In Progress 
and isAlive is true
2024-11-05 17:18:07,751 INFO BlockStateChange: Block: blk_1073741825_1002, 
Expected Replicas: 1, live replicas: 0, corrupt replicas: 0, decommissioned 
replicas: 0, decommissioning replicas: 1, maintenance replicas: 0, live 
entering maintenance replicas: 0, replicas on stale nodes:0, readonly replicas: 
0, excess replicas: 0, Is Open File: false, Datanodes having this block: 
172.31.88.149:9866 , Current Datanode: 172.31.88.149:9866, Is current datanode 
decommissioning: true, Is current datanode entering maintenance: false
2024-11-05 17:18:07,751 INFO blockmanagement.DatanodeAdminDefaultMonitor: Node 
172.31.88.149:9866 still has 1 blocks to replicate before it is a candidate to 
finish Decommission In Progress.
{quote}
* Terminate the decommissioned datanode(s). Then validate if there was any HDFS 
data loss

 

HDFS file is still readable because the block was replicated to another live 
datanode _172.31.81.19_
h3. DEBUG Logs
 * blk_1073741825_1001 (creation + Removing stale replica)

{quote}2024-11-05 17:16:32,613 DEBUG hdfs.StateChange: DIR* 
FSDirectory.addBlock: /tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654 with 
blk_1073741825_1001 block is added to the in-memory file system
2024-11-05 17:16:32,613 INFO hdfs.StateChange: BLOCK* allocate 
blk_1073741825_1001, replicas=172.31.88.149:9866 for 
/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654
2024-11-05 17:16:32,613 DEBUG namenode.FSEditLog: logEdit [RpcEdit 
op:AddBlockOp [path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
penultimateBlock=NULL, lastBlock=blk_1073741825_1001, RpcClientId=, 
RpcCallId=-2] call:Call#1 Retry#0 
org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 
ip-172-31-80-140.ec2.internal:42378 / 172.31.80.140:42378|#1 Retry#0 
org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 
ip-172-31-80-140.ec2.internal:42378 / 172.31.80.140:42378]
2024-11-05 17:16:32,613 DEBUG hdfs.StateChange: persistNewBlock: 
/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654 with new block 
blk_1073741825_1001, current total block count is 1
2024-11-05 17:16:32,614 DEBUG namenode.FSEditLog: doEditTx() op=AddBlockOp 
[path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
penultimateBlock=NULL, lastBlock=blk_1073741825_1001, RpcClientId=, 
RpcCallId=-2] txid=57
2024-11-05 17:16:32,808 DEBUG namenode.FSEditLog: logEdit [RpcEdit 
op:UpdateBlocksOp [path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
blocks=[blk_1073741825_1001], RpcClientId=, RpcCallId=-2] call:Call#3 Retry#0 
org.apache.hadoop.hdfs.protocol.ClientProtocol.fsync from 
ip-172-31-80-140.ec2.internal:42378 / 172.31.80.140:42378]
2024-11-05 17:16:32,809 DEBUG namenode.FSEditLog: logSync [RpcEdit 
op:UpdateBlocksOp [path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
blocks=[blk_1073741825_1001], RpcClientId=, RpcCallId=-2] call:Call#3 Retry#0 
org.apache.hadoop.hdfs.protocol.ClientProtocol.fsync from 
ip-172-31-80-140.ec2.internal:42378 / 172.31.80.140:42378]
2024-11-05 17:16:32,809 DEBUG namenode.FSEditLog: doEditTx() op=UpdateBlocksOp 
[path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
blocks=[blk_1073741825_1001], RpcClientId=, RpcCallId=-2] txid=58
2024-11-05 17:16:32,840 INFO namenode.FSNamesystem: BLOCK* blk_1073741825_1001 
is COMMITTED but not COMPLETE(numNodes= 0 <  minimum = 1) in file 
/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654
2024-11-05 17:16:32,851 DEBUG blockmanagement.BlockManager: Reported block 
blk_1073741825_1001 on 172.31.88.149:9866 size 307200 replicaState = FINALIZED
2024-11-05 17:16:32,851 DEBUG blockmanagement.BlockManager: Reported block 
blk_1073741825_1001 on 172.31.88.149:9866 size 307200 replicaState = FINALIZED
2024-11-05 17:16:32,851 DEBUG BlockStateChange: BLOCK* addStoredBlock: 
172.31.88.149:9866 is added to blk_1073741825_1001 (size=307200)
2024-11-05 17:16:32,851 DEBUG BlockStateChange: BLOCK* block RECEIVED_BLOCK: 
blk_1073741825_1001 is received from 172.31.88.149:9866
2024-11-05 17:16:33,244 DEBUG namenode.FSEditLog: logEdit [RpcEdit op:CloseOp 
[length=0, inodeId=0, path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
replication=1, mtime=1730826993244, atime=1730826992556, blockSize=268435456, 
blocks=[blk_1073741825_1001], permissions=root:hdfsadmingroup:rw-r-{-}r{-}-, 
aclEntries=null, clientName=null, clientMachine=null, overwrite=false, 
storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE, txid=-12345] 
call:Call#5 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.complete 
from ip-172-31-80-140.ec2.internal:42378 / 172.31.80.140:42378]
2024-11-05 17:16:33,244 DEBUG namenode.FSEditLog: doEditTx() op=CloseOp 
[length=0, inodeId=0, path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
replication=1, mtime=1730826993244, atime=1730826992556, blockSize=268435456, 
blocks=[blk_1073741825_1001], permissions=root:hdfsadmingroup:rw-r-{-}r{-}-, 
aclEntries=null, clientName=null, clientMachine=null, overwrite=false, 
storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE, txid=59] txid=59
2024-11-05 17:16:33,244 DEBUG namenode.FSEditLog: logSync [RpcEdit op:CloseOp 
[length=0, inodeId=0, path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
replication=1, mtime=1730826993244, atime=1730826992556, blockSize=268435456, 
blocks=[blk_1073741825_1001], permissions=root:hdfsadmingroup:rw-r-{-}r{-}-, 
aclEntries=null, clientName=null, clientMachine=null, overwrite=false, 
storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE, txid=59] 
call:Call#5 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.complete 
from ip-172-31-80-140.ec2.internal:42378 / 172.31.80.140:42378]
2024-11-05 17:16:33,256 DEBUG hdfs.StateChange: DIR* NameSystem.appendFile: 
file /tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654 for 
DFSClient_NONMAPREDUCE_1132039389_1 at 172.31.80.140 block 
BP-1479116858-172.31.80.140-1730826573126:blk_1073741825_1001 block size 307200
2024-11-05 17:16:33,295 INFO namenode.FSNamesystem: 
updatePipeline(blk_1073741825_1001, newGS=1002, newLength=307200, 
newNodes=[172.31.88.149:9866], client=DFSClient_NONMAPREDUCE_1132039389_1)
2024-11-05 17:16:33,296 DEBUG BlockStateChange: BLOCK* Removing stale replica 
ReplicaUC[[DISK]DS-7b8aa0df-3522-4108-a553-37946b23eadb:NORMAL:172.31.88.149:9866|RBW]
 of blk_1073741825_1001
2024-11-05 17:16:33,296 INFO namenode.FSNamesystem: 
updatePipeline(blk_1073741825_1001 => blk_1073741825_1002) success
{quote}
 
 * blk_1073741825_1002 (append)

{quote}2024-11-05 17:16:33,295 DEBUG BlockStateChange: BLOCK* 
removeStoredBlock: blk_1073741825_1002 from 172.31.88.149:9866
2024-11-05 17:16:33,296 DEBUG namenode.FSEditLog: logEdit [RpcEdit 
op:UpdateBlocksOp [path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
blocks=[blk_1073741825_1002], RpcClientId=12071ade-7bba-4e54-9c39-39ac40ba5172, 
RpcCallId=8] call:Call#8 Retry#0 
org.apache.hadoop.hdfs.protocol.ClientProtocol.updatePipeline from 
ip-172-31-80-140.ec2.internal:42378 / 172.31.80.140:42378]
2024-11-05 17:16:33,296 DEBUG namenode.FSEditLog: logSync [RpcEdit 
op:UpdateBlocksOp [path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
blocks=[blk_1073741825_1002], RpcClientId=12071ade-7bba-4e54-9c39-39ac40ba5172, 
RpcCallId=8] call:Call#8 Retry#0 
org.apache.hadoop.hdfs.protocol.ClientProtocol.updatePipeline from 
ip-172-31-80-140.ec2.internal:42378 / 172.31.80.140:42378]
2024-11-05 17:16:33,296 INFO namenode.FSNamesystem: 
updatePipeline(blk_1073741825_1001 => blk_1073741825_1002) success
2024-11-05 17:16:33,296 DEBUG namenode.FSEditLog: doEditTx() op=UpdateBlocksOp 
[path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
blocks=[blk_1073741825_1002], RpcClientId=12071ade-7bba-4e54-9c39-39ac40ba5172, 
RpcCallId=8] txid=62
2024-11-05 17:16:33,306 DEBUG namenode.FSEditLog: logEdit [RpcEdit 
op:UpdateBlocksOp [path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
blocks=[blk_1073741825_1002], RpcClientId=, RpcCallId=-2] call:Call#9 Retry#0 
org.apache.hadoop.hdfs.protocol.ClientProtocol.fsync from 
ip-172-31-80-140.ec2.internal:42378 / 172.31.80.140:42378]
2024-11-05 17:16:33,306 DEBUG namenode.FSEditLog: logSync [RpcEdit 
op:UpdateBlocksOp [path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
blocks=[blk_1073741825_1002], RpcClientId=, RpcCallId=-2] call:Call#9 Retry#0 
org.apache.hadoop.hdfs.protocol.ClientProtocol.fsync from 
ip-172-31-80-140.ec2.internal:42378 / 172.31.80.140:42378]
2024-11-05 17:16:33,306 DEBUG namenode.FSEditLog: doEditTx() op=UpdateBlocksOp 
[path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
blocks=[blk_1073741825_1002], RpcClientId=, RpcCallId=-2] txid=63
2024-11-05 17:16:33,320 DEBUG blockmanagement.BlockManager: Reported block 
blk_1073741825_1002 on 172.31.88.149:9866 size 614400 replicaState = FINALIZED
2024-11-05 17:16:33,320 DEBUG blockmanagement.BlockManager: Reported block 
blk_1073741825_1002 on 172.31.88.149:9866 size 614400 replicaState = FINALIZED
2024-11-05 17:16:33,320 DEBUG BlockStateChange: BLOCK* addStoredBlock: 
172.31.88.149:9866 is added to blk_1073741825_1002 (size=409600)2024-11-05 
17:16:33,320 DEBUG BlockStateChange: BLOCK* block RECEIVED_BLOCK: 
blk_1073741825_1002 is received from 172.31.88.149:9866
2024-11-05 17:16:33,321 DEBUG namenode.FSEditLog: logEdit [RpcEdit op:CloseOp 
[length=0, inodeId=0, path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
replication=1, mtime=1730826993321, atime=1730826992556, blockSize=268435456, 
blocks=[blk_1073741825_1002], permissions=root:hdfsadmingroup:rw-r-{-}r{-}-, 
aclEntries=null, clientName=null, clientMachine=null, overwrite=false, 
storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE, txid=-12345] 
call:Call#10 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.complete 
from ip-172-31-80-140.ec2.internal:42378 / 172.31.80.140:42378]
2024-11-05 17:16:33,321 DEBUG namenode.FSEditLog: doEditTx() op=CloseOp 
[length=0, inodeId=0, path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
replication=1, mtime=1730826993321, atime=1730826992556, blockSize=268435456, 
blocks=[blk_1073741825_1002], permissions=root:hdfsadmingroup:rw-r-{-}r{-}-, 
aclEntries=null, clientName=null, clientMachine=null, overwrite=false, 
storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE, txid=64] txid=64
2024-11-05 17:16:33,321 DEBUG namenode.FSEditLog: logSync [RpcEdit op:CloseOp 
[length=0, inodeId=0, path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
replication=1, mtime=1730826993321, atime=1730826992556, blockSize=268435456, 
blocks=[blk_1073741825_1002], permissions=root:hdfsadmingroup:rw-r-{-}r{-}-, 
aclEntries=null, clientName=null, clientMachine=null, overwrite=false, 
storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE, txid=64] 
call:Call#10 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.complete 
from ip-172-31-80-140.ec2.internal:42378 / 172.31.80.140:42378]
{quote}
 
 * blk_1073741825_* (decommissioning)

{quote}2024-11-05 17:18:07,751 DEBUG BlockStateChange: BLOCK* 
NameSystem.LowRedundancyBlock.add: blk_1073741825_1002 has only 0 replicas and 
need 1 replicas so is added to neededReconstructions at priority level 0
2024-11-05 17:18:07,751 INFO BlockStateChange: Block: blk_1073741825_1002, 
Expected Replicas: 1, live replicas: 0, corrupt replicas: 0, decommissioned 
replicas: 0, decommissioning replicas: 1, maintenance replicas: 0, live 
entering maintenance replicas: 0, replicas on stale nodes:0, readonly replicas: 
0, excess replicas: 0, Is Open File: false, Datanodes having this block: 
172.31.88.149:9866 , Current Datanode: 172.31.88.149:9866, Is current datanode 
decommissioning: true, Is current datanode entering maintenance: false
2024-11-05 17:18:07,807 DEBUG blockmanagement.BlockReconstructionWork: Creating 
a ReplicationWork to reconstruct blk_1073741825_1002
2024-11-05 17:18:07,808 DEBUG BlockStateChange: BLOCK* block 
blk_1073741825_1002 is moved from neededReconstruction to pendingReconstruction
2024-11-05 17:18:07,808 DEBUG BlockStateChange: BLOCK* 
NameSystem.LowRedundancyBlock.remove: Removing block blk_1073741825_1002 from 
priority queue 0
2024-11-05 17:18:07,808 DEBUG BlockStateChange: BLOCK* ask [172.31.88.149:9866] 
to replicate blk_1073741825_1002 to datanode(s) 172.31.81.19:9866
2024-11-05 17:18:08,885 DEBUG blockmanagement.BlockManager: Removing pending 
reconstruction for blk_1073741825_1002
2024-11-05 17:18:08,885 DEBUG blockmanagement.BlockManager: Reported block 
blk_1073741825_1002 on 172.31.81.19:9866 size 614400 replicaState = FINALIZED
2024-11-05 17:18:08,885 DEBUG blockmanagement.BlockManager: Reported block 
blk_1073741825_1002 on 172.31.81.19:9866 size 614400 replicaState = FINALIZED
2024-11-05 17:18:08,885 DEBUG BlockStateChange: BLOCK* addStoredBlock: 
172.31.81.19:9866 is added to blk_1073741825_1002 (size=614400)
2024-11-05 17:18:08,885 DEBUG BlockStateChange: BLOCK* block RECEIVED_BLOCK: 
blk_1073741825_1002 is received from 172.31.81.19:9866
2024-11-05 17:18:44,130 DEBUG BlockStateChange: BLOCK* 
ExcessRedundancyMap.add(172.31.88.149:9866, blk_1073741825_1002)
2024-11-05 17:18:44,131 DEBUG BlockStateChange: BLOCK* InvalidateBlocks: add 
blk_1073741825_1002 to 172.31.88.149:9866
2024-11-05 17:18:44,131 DEBUG BlockStateChange: BLOCK* 
chooseExcessRedundancies: 
([DISK]DS-7b8aa0df-3522-4108-a553-37946b23eadb:NORMAL:172.31.88.149:9866, 
blk_1073741825_1002) is added to invalidated blocks set
2024-11-05 17:18:46,811 DEBUG BlockStateChange: BLOCK* BlockManager: ask 
172.31.88.149:9866 to delete [blk_1073741825_1002]
2024-11-05 17:18:53,737 DEBUG BlockStateChange: BLOCK* removeStoredBlock: 
blk_1073741825_1002 from 172.31.88.149:9866
2024-11-05 17:18:53,737 DEBUG BlockStateChange: BLOCK* 
ExcessRedundancyMap.remove(172.31.88.149:9866, blk_1073741825_1002)
2024-11-05 17:18:53,737 DEBUG BlockStateChange: BLOCK* block DELETED_BLOCK: 
blk_1073741825_1002 is received from 172.31.88.149:9866
2024-11-05 17:18:54,138 DEBUG blockmanagement.BlockManager: blocks = 
[blk_1073741825_1002]
{quote}
 


was (Author: kevinwikant):
h2. Test#4: Create Block & Repeatedly Append -> Close DFSOutputStream -> 
Re-open Block & Repeatedly Append -> Close DFSOutputStream -> Decommission 
Datanode (dfs.replication = 1)
 * Create a new HDFS cluster (with 0 blocks to begin with)
 * Run the test script to execute the desired test case

{quote}2024-11-05 17:16:32 Append Success  100 KB
2024-11-05 17:16:32 Append Success  200 KB
2024-11-05 17:16:32 Append Success  300 KB
2024-11-05 17:16:33 Append Success  400 KB
2024-11-05 17:16:33 Append Success  500 KB
2024-11-05 17:16:33 Append Success  600 KB
{quote} * Check the namenode logs to identify which datanode(s) have the block 
replica(s)

{quote}2024-11-05 17:16:32,613 INFO hdfs.StateChange: BLOCK* allocate 
blk_1073741825_1001, replicas=172.31.88.149:9866 for 
/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654
{quote} * Start decommissioning all the datanode(s) with block replica(s)
 * Check the namenode logs to check if the block replica(s) are considered by 
the DatanodeAdminManager & are replicated to other datanode(s) before the 
containing datanode(s) become decommissioned

{quote}2024-11-05 17:17:37,773 INFO blockmanagement.DatanodeAdminManager: 
Starting decommission of 172.31.88.149:9866 
[DISK]DS-7b8aa0df-3522-4108-a553-37946b23eadb:NORMAL:172.31.88.149:9866 with 1 
blocks
2024-11-05 17:17:37,773 INFO blockmanagement.DatanodeAdminManager: Starting 
decommission of 172.31.88.149:9866 
[DISK]DS-43e2de52-232a-4437-a008-ad7898c816d7:NORMAL:172.31.88.149:9866 with 0 
blocks
2024-11-05 17:18:07,749 INFO blockmanagement.DatanodeAdminDefaultMonitor: 
Remove any stale pending Node same as added node 172.31.88.149:9866 with UUID 
20ac1984-5bf0-40f1-9488-fbd23c1e7cff which is in state Decommission In Progress 
and isAlive is true
2024-11-05 17:18:07,751 INFO BlockStateChange: Block: blk_1073741825_1002, 
Expected Replicas: 1, live replicas: 0, corrupt replicas: 0, decommissioned 
replicas: 0, decommissioning replicas: 1, maintenance replicas: 0, live 
entering maintenance replicas: 0, replicas on stale nodes:0, readonly replicas: 
0, excess replicas: 0, Is Open File: false, Datanodes having this block: 
172.31.88.149:9866 , Current Datanode: 172.31.88.149:9866, Is current datanode 
decommissioning: true, Is current datanode entering maintenance: false
2024-11-05 17:18:07,751 INFO blockmanagement.DatanodeAdminDefaultMonitor: Node 
172.31.88.149:9866 still has 1 blocks to replicate before it is a candidate to 
finish Decommission In Progress.
{quote} * Terminate the decommissioned datanode(s). Then validate if there was 
any HDFS data loss

HDFS file is still readable because the block was replicated to another live 
datanode _172.31.81.19_
h3. DEBUG Logs
 * blk_1073741825_1001 (creation + Removing stale replica)

{quote}2024-11-05 17:16:32,613 DEBUG hdfs.StateChange: DIR* 
FSDirectory.addBlock: /tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654 with 
blk_1073741825_1001 block is added to the in-memory file system
2024-11-05 17:16:32,613 INFO hdfs.StateChange: BLOCK* allocate 
blk_1073741825_1001, replicas=172.31.88.149:9866 for 
/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654
2024-11-05 17:16:32,613 DEBUG namenode.FSEditLog: logEdit [RpcEdit 
op:AddBlockOp [path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
penultimateBlock=NULL, lastBlock=blk_1073741825_1001, RpcClientId=, 
RpcCallId=-2] call:Call#1 Retry#0 
org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 
ip-172-31-80-140.ec2.internal:42378 / 172.31.80.140:42378]
2024-11-05 17:16:32,613 DEBUG hdfs.StateChange: persistNewBlock: 
/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654 with new block 
blk_1073741825_1001, current total block count is 1
2024-11-05 17:16:32,614 DEBUG namenode.FSEditLog: doEditTx() op=AddBlockOp 
[path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
penultimateBlock=NULL, lastBlock=blk_1073741825_1001, RpcClientId=, 
RpcCallId=-2] txid=57
2024-11-05 17:16:32,808 DEBUG namenode.FSEditLog: logEdit [RpcEdit 
op:UpdateBlocksOp [path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
blocks=[blk_1073741825_1001], RpcClientId=, RpcCallId=-2] call:Call#3 Retry#0 
org.apache.hadoop.hdfs.protocol.ClientProtocol.fsync from 
ip-172-31-80-140.ec2.internal:42378 / 172.31.80.140:42378]
2024-11-05 17:16:32,809 DEBUG namenode.FSEditLog: logSync [RpcEdit 
op:UpdateBlocksOp [path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
blocks=[blk_1073741825_1001], RpcClientId=, RpcCallId=-2] call:Call#3 Retry#0 
org.apache.hadoop.hdfs.protocol.ClientProtocol.fsync from 
ip-172-31-80-140.ec2.internal:42378 / 172.31.80.140:42378]
2024-11-05 17:16:32,809 DEBUG namenode.FSEditLog: doEditTx() op=UpdateBlocksOp 
[path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
blocks=[blk_1073741825_1001], RpcClientId=, RpcCallId=-2] txid=58
2024-11-05 17:16:32,840 INFO namenode.FSNamesystem: BLOCK* blk_1073741825_1001 
is COMMITTED but not COMPLETE(numNodes= 0 <  minimum = 1) in file 
/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654
2024-11-05 17:16:32,851 DEBUG blockmanagement.BlockManager: Reported block 
blk_1073741825_1001 on 172.31.88.149:9866 size 307200 replicaState = FINALIZED
2024-11-05 17:16:32,851 DEBUG blockmanagement.BlockManager: Reported block 
blk_1073741825_1001 on 172.31.88.149:9866 size 307200 replicaState = FINALIZED
2024-11-05 17:16:32,851 DEBUG BlockStateChange: BLOCK* addStoredBlock: 
172.31.88.149:9866 is added to blk_1073741825_1001 (size=307200)
2024-11-05 17:16:32,851 DEBUG BlockStateChange: BLOCK* block RECEIVED_BLOCK: 
blk_1073741825_1001 is received from 172.31.88.149:9866
2024-11-05 17:16:33,244 DEBUG namenode.FSEditLog: logEdit [RpcEdit op:CloseOp 
[length=0, inodeId=0, path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
replication=1, mtime=1730826993244, atime=1730826992556, blockSize=268435456, 
blocks=[blk_1073741825_1001], permissions=root:hdfsadmingroup:rw-r--r--, 
aclEntries=null, clientName=null, clientMachine=null, overwrite=false, 
storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE, txid=-12345] 
call:Call#5 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.complete 
from ip-172-31-80-140.ec2.internal:42378 / 172.31.80.140:42378]
2024-11-05 17:16:33,244 DEBUG namenode.FSEditLog: doEditTx() op=CloseOp 
[length=0, inodeId=0, path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
replication=1, mtime=1730826993244, atime=1730826992556, blockSize=268435456, 
blocks=[blk_1073741825_1001], permissions=root:hdfsadmingroup:rw-r--r--, 
aclEntries=null, clientName=null, clientMachine=null, overwrite=false, 
storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE, txid=59] txid=59
2024-11-05 17:16:33,244 DEBUG namenode.FSEditLog: logSync [RpcEdit op:CloseOp 
[length=0, inodeId=0, path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
replication=1, mtime=1730826993244, atime=1730826992556, blockSize=268435456, 
blocks=[blk_1073741825_1001], permissions=root:hdfsadmingroup:rw-r--r--, 
aclEntries=null, clientName=null, clientMachine=null, overwrite=false, 
storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE, txid=59] 
call:Call#5 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.complete 
from ip-172-31-80-140.ec2.internal:42378 / 172.31.80.140:42378]
2024-11-05 17:16:33,256 DEBUG hdfs.StateChange: DIR* NameSystem.appendFile: 
file /tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654 for 
DFSClient_NONMAPREDUCE_1132039389_1 at 172.31.80.140 block 
BP-1479116858-172.31.80.140-1730826573126:blk_1073741825_1001 block size 307200
2024-11-05 17:16:33,295 INFO namenode.FSNamesystem: 
updatePipeline(blk_1073741825_1001, newGS=1002, newLength=307200, 
newNodes=[172.31.88.149:9866], client=DFSClient_NONMAPREDUCE_1132039389_1)
2024-11-05 17:16:33,296 DEBUG BlockStateChange: BLOCK* Removing stale replica 
ReplicaUC[[DISK]DS-7b8aa0df-3522-4108-a553-37946b23eadb:NORMAL:172.31.88.149:9866|RBW]
 of blk_1073741825_1001
2024-11-05 17:16:33,296 INFO namenode.FSNamesystem: 
updatePipeline(blk_1073741825_1001 => blk_1073741825_1002) success
{quote}
 
 * blk_1073741825_1002 (append)

{quote}2024-11-05 17:16:33,295 DEBUG BlockStateChange: BLOCK* 
removeStoredBlock: blk_1073741825_1002 from 172.31.88.149:9866
2024-11-05 17:16:33,296 DEBUG namenode.FSEditLog: logEdit [RpcEdit 
op:UpdateBlocksOp [path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
blocks=[blk_1073741825_1002], RpcClientId=12071ade-7bba-4e54-9c39-39ac40ba5172, 
RpcCallId=8] call:Call#8 Retry#0 
org.apache.hadoop.hdfs.protocol.ClientProtocol.updatePipeline from 
ip-172-31-80-140.ec2.internal:42378 / 172.31.80.140:42378]
2024-11-05 17:16:33,296 DEBUG namenode.FSEditLog: logSync [RpcEdit 
op:UpdateBlocksOp [path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
blocks=[blk_1073741825_1002], RpcClientId=12071ade-7bba-4e54-9c39-39ac40ba5172, 
RpcCallId=8] call:Call#8 Retry#0 
org.apache.hadoop.hdfs.protocol.ClientProtocol.updatePipeline from 
ip-172-31-80-140.ec2.internal:42378 / 172.31.80.140:42378]
2024-11-05 17:16:33,296 INFO namenode.FSNamesystem: 
updatePipeline(blk_1073741825_1001 => blk_1073741825_1002) success
2024-11-05 17:16:33,296 DEBUG namenode.FSEditLog: doEditTx() op=UpdateBlocksOp 
[path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
blocks=[blk_1073741825_1002], RpcClientId=12071ade-7bba-4e54-9c39-39ac40ba5172, 
RpcCallId=8] txid=62
2024-11-05 17:16:33,306 DEBUG namenode.FSEditLog: logEdit [RpcEdit 
op:UpdateBlocksOp [path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
blocks=[blk_1073741825_1002], RpcClientId=, RpcCallId=-2] call:Call#9 Retry#0 
org.apache.hadoop.hdfs.protocol.ClientProtocol.fsync from 
ip-172-31-80-140.ec2.internal:42378 / 172.31.80.140:42378]
2024-11-05 17:16:33,306 DEBUG namenode.FSEditLog: logSync [RpcEdit 
op:UpdateBlocksOp [path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
blocks=[blk_1073741825_1002], RpcClientId=, RpcCallId=-2] call:Call#9 Retry#0 
org.apache.hadoop.hdfs.protocol.ClientProtocol.fsync from 
ip-172-31-80-140.ec2.internal:42378 / 172.31.80.140:42378]
2024-11-05 17:16:33,306 DEBUG namenode.FSEditLog: doEditTx() op=UpdateBlocksOp 
[path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
blocks=[blk_1073741825_1002], RpcClientId=, RpcCallId=-2] txid=63
2024-11-05 17:16:33,320 DEBUG blockmanagement.BlockManager: Reported block 
blk_1073741825_1002 on 172.31.88.149:9866 size 614400 replicaState = FINALIZED
2024-11-05 17:16:33,320 DEBUG blockmanagement.BlockManager: Reported block 
blk_1073741825_1002 on 172.31.88.149:9866 size 614400 replicaState = FINALIZED
2024-11-05 17:16:33,320 DEBUG BlockStateChange: BLOCK* addStoredBlock: 
172.31.88.149:9866 is added to blk_1073741825_1002 (size=409600)2024-11-05 
17:16:33,320 DEBUG BlockStateChange: BLOCK* block RECEIVED_BLOCK: 
blk_1073741825_1002 is received from 172.31.88.149:9866
2024-11-05 17:16:33,321 DEBUG namenode.FSEditLog: logEdit [RpcEdit op:CloseOp 
[length=0, inodeId=0, path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
replication=1, mtime=1730826993321, atime=1730826992556, blockSize=268435456, 
blocks=[blk_1073741825_1002], permissions=root:hdfsadmingroup:rw-r--r--, 
aclEntries=null, clientName=null, clientMachine=null, overwrite=false, 
storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE, txid=-12345] 
call:Call#10 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.complete 
from ip-172-31-80-140.ec2.internal:42378 / 172.31.80.140:42378]
2024-11-05 17:16:33,321 DEBUG namenode.FSEditLog: doEditTx() op=CloseOp 
[length=0, inodeId=0, path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
replication=1, mtime=1730826993321, atime=1730826992556, blockSize=268435456, 
blocks=[blk_1073741825_1002], permissions=root:hdfsadmingroup:rw-r--r--, 
aclEntries=null, clientName=null, clientMachine=null, overwrite=false, 
storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE, txid=64] txid=64
2024-11-05 17:16:33,321 DEBUG namenode.FSEditLog: logSync [RpcEdit op:CloseOp 
[length=0, inodeId=0, path=/tmp/testfile-25cdf6d8-202b-48a8-8046-18285db3a654, 
replication=1, mtime=1730826993321, atime=1730826992556, blockSize=268435456, 
blocks=[blk_1073741825_1002], permissions=root:hdfsadmingroup:rw-r--r--, 
aclEntries=null, clientName=null, clientMachine=null, overwrite=false, 
storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE, txid=64] 
call:Call#10 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.complete 
from ip-172-31-80-140.ec2.internal:42378 / 172.31.80.140:42378]
{quote}
 
 * blk_1073741825_* (decommissioning

{quote}2024-11-05 17:18:07,751 DEBUG BlockStateChange: BLOCK* 
NameSystem.LowRedundancyBlock.add: blk_1073741825_1002 has only 0 replicas and 
need 1 replicas so is added to neededReconstructions at priority level 0
2024-11-05 17:18:07,751 INFO BlockStateChange: Block: blk_1073741825_1002, 
Expected Replicas: 1, live replicas: 0, corrupt replicas: 0, decommissioned 
replicas: 0, decommissioning replicas: 1, maintenance replicas: 0, live 
entering maintenance replicas: 0, replicas on stale nodes:0, readonly replicas: 
0, excess replicas: 0, Is Open File: false, Datanodes having this block: 
172.31.88.149:9866 , Current Datanode: 172.31.88.149:9866, Is current datanode 
decommissioning: true, Is current datanode entering maintenance: false
2024-11-05 17:18:07,807 DEBUG blockmanagement.BlockReconstructionWork: Creating 
a ReplicationWork to reconstruct blk_1073741825_1002
2024-11-05 17:18:07,808 DEBUG BlockStateChange: BLOCK* block 
blk_1073741825_1002 is moved from neededReconstruction to pendingReconstruction
2024-11-05 17:18:07,808 DEBUG BlockStateChange: BLOCK* 
NameSystem.LowRedundancyBlock.remove: Removing block blk_1073741825_1002 from 
priority queue 0
2024-11-05 17:18:07,808 DEBUG BlockStateChange: BLOCK* ask [172.31.88.149:9866] 
to replicate blk_1073741825_1002 to datanode(s) 172.31.81.19:9866
2024-11-05 17:18:08,885 DEBUG blockmanagement.BlockManager: Removing pending 
reconstruction for blk_1073741825_1002
2024-11-05 17:18:08,885 DEBUG blockmanagement.BlockManager: Reported block 
blk_1073741825_1002 on 172.31.81.19:9866 size 614400 replicaState = FINALIZED
2024-11-05 17:18:08,885 DEBUG blockmanagement.BlockManager: Reported block 
blk_1073741825_1002 on 172.31.81.19:9866 size 614400 replicaState = FINALIZED
2024-11-05 17:18:08,885 DEBUG BlockStateChange: BLOCK* addStoredBlock: 
172.31.81.19:9866 is added to blk_1073741825_1002 (size=614400)
2024-11-05 17:18:08,885 DEBUG BlockStateChange: BLOCK* block RECEIVED_BLOCK: 
blk_1073741825_1002 is received from 172.31.81.19:9866
2024-11-05 17:18:44,130 DEBUG BlockStateChange: BLOCK* 
ExcessRedundancyMap.add(172.31.88.149:9866, blk_1073741825_1002)
2024-11-05 17:18:44,131 DEBUG BlockStateChange: BLOCK* InvalidateBlocks: add 
blk_1073741825_1002 to 172.31.88.149:9866
2024-11-05 17:18:44,131 DEBUG BlockStateChange: BLOCK* 
chooseExcessRedundancies: 
([DISK]DS-7b8aa0df-3522-4108-a553-37946b23eadb:NORMAL:172.31.88.149:9866, 
blk_1073741825_1002) is added to invalidated blocks set
2024-11-05 17:18:46,811 DEBUG BlockStateChange: BLOCK* BlockManager: ask 
172.31.88.149:9866 to delete [blk_1073741825_1002]
2024-11-05 17:18:53,737 DEBUG BlockStateChange: BLOCK* removeStoredBlock: 
blk_1073741825_1002 from 172.31.88.149:9866
2024-11-05 17:18:53,737 DEBUG BlockStateChange: BLOCK* 
ExcessRedundancyMap.remove(172.31.88.149:9866, blk_1073741825_1002)
2024-11-05 17:18:53,737 DEBUG BlockStateChange: BLOCK* block DELETED_BLOCK: 
blk_1073741825_1002 is received from 172.31.88.149:9866
2024-11-05 17:18:54,138 DEBUG blockmanagement.BlockManager: blocks = 
[blk_1073741825_1002]
{quote}
 

> HDFS decommissioning does not consider if Under Construction blocks are 
> sufficiently replicated which causes HDFS Data Loss
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-17658
>                 URL: https://issues.apache.org/jira/browse/HDFS-17658
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.4.0
>            Reporter: Kevin Wikant
>            Priority: Major
>
> h2. Background
> The HDFS Namenode manages datanode decommissioning using the 
> DatanodeAdminManager
> The DatanodeAdminManager has logic to prevent transitioning datanodes to 
> decommissioned state if they contain open blocks (i.e. Under Construction 
> blocks) which are not sufficiently replicated to other datanodes. In the 
> context of decomissioning, the reason that open blocks are important is 
> because they cannot be replicated to other datanodes given they are actively 
> being appended by an HDFS client. Because open blocks cannot be moved during 
> decommissioning, they should prevent the associated datanodes from becoming 
> decommissioned until the block is completed/finalized and it can be safely 
> moved to other live datanode(s).
>  
> h2. Problem
> The logic for DatanodeAdminManager to avoid decommissioning datanodes which 
> contain open blocks does not properly consider blocks which are in Under 
> Construction state. The DatanodeAdminManager is only considering blocks which 
> are in committed/finalized state. For reference:
>  * a block which is actively being appended by a DFSOutputStream is in Under 
> Construction state
>  * only when a DFSOutputStream is closed does the block transition from Under 
> Construction state to Committed/Finalized state
>  * then later when the block is reported to the namenode in a block report, 
> it will transition from Committed to Completed state
> Furthermore:
>  * this is true for a new file/block that was just created via 
> "DFSClient.create"
>  * this is true for an existing file/block that was just opened via 
> "DFSClient.append"
>  * this is true for all different dfs.replication factor values
>  
> h2. Impact
> In some environments, a datanode being decommissioned is taken as a signal 
> that all the blocks on that datanodes are sufficiently replicated to other 
> live datanodes & therefore it is safe to terminate the underlying virtual 
> host running the datanode.
> These types of environments can be impacted by HDFS data loss for open blocks 
> which are not considered in the datanode decommissioning process. Since open 
> blocks are not considered in determining if a datanode can be decommissioned, 
> a datanode may become decommissioned before its open blocks are replicated to 
> other datanodes. If the decommissioned datanode is then terminated, the open 
> blocks will be lost.
> For open blocks with replication of 1, it takes a single 
> decommissioned/terminated datanode to cause data loss. For open blocks with 
> greater replication, all the datanodes holding the open block must be 
> decommissioned/terminated for there to be data loss.
> I would also break the impact down into 2 cases:
>  * for a new HDFS block that has never been closed, arguably if the 
> DFSOutputStream encounters failure then the client should be able to replay 
> the data from source
>  * for an existing HDFS block that has previously been closed, when this 
> block is opened via a new DFSOutputStream the block is susceptible to being 
> lost & the client cannot replay data which was appended in the past by a 
> different client. This is arguably the worse case of impact.
>  
> h2. Testing
> This behaviour has been verified via testing on Hadoop 3.4.0; however, I 
> suspect it also applies to many other older/newer Hadoop versions.
> See JIRA comments for detailed test methodology & results.
> The following is a summary of the test cases & test results:
> {quote}*Test#1: Create Block & Repeatedly Append in Loop → Decommission 
> Datanode*
> ^ Expectation: block is considered during decommissioning; datanode is not 
> decommissioned until the write operation is finished & block is replicated to 
> another datanode.
> ^ Observation: block is not considered during decommissioning & is lost when 
> decommissioned data is terminated.
> *Test#2: Create Block & Repeatedly Append in Loop → Close DFSOutputStream → 
> Decommission Datanode*
> ^ Expectation: block is considered during decommissioning & is replicated to 
> another datanode as part of decommissioning.
> ^ Observation: block is considered during decommissioning & is replicated to 
> another datanode as part of decommissioning.
> *Test#3: Create Block & Repeatedly Append in Loop → Close DFSOutputStream → 
> Re-open Block & Repeatedly Append in Loop → Decommission Datanode*
> ^ Expectation: block is considered during decommissioning; datanode is not 
> decommissioned until the write operation is finished & block is replicated to 
> another datanode.
> ^ Observation: block is not considered during decommissioning & is lost when 
> decommissioned data is terminated.
> *Test#4: Create Block & Repeatedly Append in Loop → Close DFSOutputStream → 
> Re-open Block & Repeatedly Append in Loop → Close DFSOutputStream → 
> Decommission Datanode*
> ^ Expectation: block is considered during decommissioning & is replicated to 
> another datanode as part of decommissioning.
> ^ Observation: block is not considered during decommissioning & is lost when 
> decommissioned data is terminated.
> {quote}
> These were all tested with replication factor of 1 & 2, observation is the 
> same in both cases.
>  
> h2. Root Cause Theory
> The DatanodeAdminManager relies on the DatanodeDescriptor StorageInfos to 
> identify which blocks to consider as part of wether or not a datanode can be 
> decommissioned.
> Based on an examination of the HDFS Namenode & Datanode DEBUG logs during 
> testing, I believe the root cause has to do with the following 2 behaviours:
> {{*1.* When a DFSOutputStream is created for a block, that block enters Under 
> Construction state but the Namenode does not add Under Construction blocks to 
> the StorageInfos unless they have been committed/finalized which only occurs 
> when the DFSOutputStream is closed. Therefore, by checking the StorageInfos 
> only, the DatanodeAdminManager is not actually checking Under Construction 
> blocks.}}
> During the testing, we see that the Under Construction blocks are not in the 
> StorageInfos for the corresponding DatanodeDescriptor:
> {quote}2024-11-05 14:32:19,206 DEBUG BlockStateChange: BLOCK* block 
> RECEIVING_BLOCK: blk_1073741825_1001 is received from 172.31.93.123:9866
> ...
> 2024-11-05 14:36:02,805 INFO blockmanagement.DatanodeAdminManager: Starting 
> decommission of 172.31.93.123:9866 
> [DISK]DS-bb1d316c-a47a-4a3a-bf6e-0781945a50d1:NORMAL:172.31.93.123:9866 with 
> 0 blocks
> 2024-11-05 14:36:02,805 INFO blockmanagement.DatanodeAdminManager: Starting 
> decommission of 172.31.93.123:9866 
> [DISK]DS-95cfd2fa-25b0-4b20-aacf-e259201cf2eb:NORMAL:172.31.93.123:9866 with 
> 0 blocks
> {quote}
> Whereas, if the DFSOutputStream is closed, we see the block is included in 
> the StorageInfos:
> {quote}2024-11-05 14:49:49,563 DEBUG BlockStateChange: BLOCK* block 
> RECEIVED_BLOCK: blk_1073741825_1001 is received from 172.31.95.159:9866
> ...
> 2024-11-05 14:52:36,770 INFO blockmanagement.DatanodeAdminManager: Starting 
> decommission of 172.31.95.159:9866 
> [DISK]DS-07f2256b-0cbc-4b5e-9c6c-9108650ee896:NORMAL:172.31.95.159:9866 with 
> 0 blocks
> 2024-11-05 14:52:36,770 INFO blockmanagement.DatanodeAdminManager: Starting 
> decommission of 172.31.95.159:9866 
> [DISK]DS-91919280-df53-4dac-b983-0eb12c81e4bf:NORMAL:172.31.95.159:9866 with 
> 1 blocks
> 2024-11-05 14:52:48,231 INFO BlockStateChange: Block: blk_1073741825_1001, 
> Expected Replicas: 1, live replicas: 0, corrupt replicas: 0, decommissioned 
> replicas: 0, decommissioning replicas: 1, maintenance replicas: 0, live 
> entering maintenance replicas: 0, replicas on stale nodes:0, readonly 
> replicas: 0, excess replicas: 0, Is Open File: false, Datanodes having this 
> block: 172.31.95.159:9866 , Current Datanode: 172.31.95.159:9866, Is current 
> datanode decommissioning: true, Is current datanode entering maintenance: 
> false
> {quote}
> I think this behaviour might be related to the following code which has been 
> in Hadoop for the past 10+ years: 
> [https://github.com/apache/hadoop/blob/51ebc3c20e8ae7d4dced41cdd2f52715aea604cc/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L3708]
>  
>  
> {{*2.* When a DFSOutputStream is created for an existing block, the Namenode 
> marks the existing block with previous generation stamp as stale & removes it 
> from the StorageInfos.}}
> We can see the following DEBUG logs during repro testing:
> {quote}2024-11-05 15:47:24,131 INFO namenode.FSNamesystem: 
> updatePipeline(blk_1073741825_1001, newGS=1002, newLength=307200, 
> newNodes=[172.31.95.208:9866], client=DFSClient_NONMAPREDUCE_-1408310900_1)
> 2024-11-05 15:47:24,132 DEBUG BlockStateChange: BLOCK* Removing stale replica 
> ReplicaUC[[DISK]DS-1bd20290-4662-4e5e-af0b-9b644d78d4f8:NORMAL:172.31.95.208:9866|RBW]
>  of blk_1073741825_1001
> 2024-11-05 15:47:24,132 INFO namenode.FSNamesystem: 
> updatePipeline(blk_1073741825_1001 => blk_1073741825_1002) success
> {quote}
> Tracing the code from the logs we see where this occurs:
>  * 
> [https://github.com/apache/hadoop/blob/f7651e2f63ddba9ed1ae4052e38464f85dd445f0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L4476]
>  * 
> [https://github.com/apache/hadoop/blob/f7651e2f63ddba9ed1ae4052e38464f85dd445f0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L4433]
>  * 
> [https://github.com/apache/hadoop/blob/f7651e2f63ddba9ed1ae4052e38464f85dd445f0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlocksMap.java#L202]
>  
>  
> h3. Potential Solution
> If possible, can consider adding Under Construction blocks to StorageInfos. 
> If this would have other adverse impacts, then another solution is to expose 
> the Under Construction blocks to the DatanodeAdminManager via another 
> separate data structure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-17658) HDFS decommissioning does not consider if Under Construction blocks are sufficiently replicated which causes HDFS Data Loss

Reply via email to