Stephen O'Donnell created HDFS-15725:
----------------------------------------
Summary: Lease Recovery never completes for a committed block
which the DNs never finalize
Key: HDFS-15725
URL: https://issues.apache.org/jira/browse/HDFS-15725
Project: Hadoop HDFS
Issue Type: Bug
Components: namenode
Affects Versions: 3.4.0
Reporter: Stephen O'Donnell
Assignee: Stephen O'Donnell
It a very rare condition, the HDFS client process can get killed right at the
time it is completing a block / file.
The client sends the "complete" call to the namenode, moving the block into a
committed state, but it dies before it can send the final packet to the
Datanodes telling them to finalize the block.
This means the blocks are stuck on the datanodes in RBW state and nothing will
ever tell them to move out of that state.
The namenode / lease manager will retry forever to close the file, but it will
always complain it is waiting for blocks to reach minimal replication.
I have a simple test and patch to fix this, but I think it warrants some
discussion on whether this is the correct thing to do, or if I need to put the
fix behind a config switch.
My idea, is that if lease recovery occurs, and the block is still waiting on
"minimal replication", just put the file back to UNDER_CONSTRUCTION so that on
the next lease recovery attempt, BLOCK RECOVERY will happen, close the file and
move the replicas to FINALIZED.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]