[
https://issues.apache.org/jira/browse/HDFS-15725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephen O'Donnell updated HDFS-15725:
-------------------------------------
Attachment: HDFS-15725.003.patch
> Lease Recovery never completes for a committed block which the DNs never
> finalize
> ---------------------------------------------------------------------------------
>
> Key: HDFS-15725
> URL: https://issues.apache.org/jira/browse/HDFS-15725
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 3.4.0
> Reporter: Stephen O'Donnell
> Assignee: Stephen O'Donnell
> Priority: Major
> Attachments: HDFS-15725.001.patch, HDFS-15725.002.patch,
> HDFS-15725.003.patch, lease_recovery_2_10.patch
>
>
> It a very rare condition, the HDFS client process can get killed right at the
> time it is completing a block / file.
> The client sends the "complete" call to the namenode, moving the block into a
> committed state, but it dies before it can send the final packet to the
> Datanodes telling them to finalize the block.
> This means the blocks are stuck on the datanodes in RBW state and nothing
> will ever tell them to move out of that state.
> The namenode / lease manager will retry forever to close the file, but it
> will always complain it is waiting for blocks to reach minimal replication.
> I have a simple test and patch to fix this, but I think it warrants some
> discussion on whether this is the correct thing to do, or if I need to put
> the fix behind a config switch.
> My idea, is that if lease recovery occurs, and the block is still waiting on
> "minimal replication", just put the file back to UNDER_CONSTRUCTION so that
> on the next lease recovery attempt, BLOCK RECOVERY will happen, close the
> file and move the replicas to FINALIZED.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]