[
https://issues.apache.org/jira/browse/HDFS-12747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16227445#comment-16227445
]
Daryn Sharp commented on HDFS-12747:
------------------------------------
It involves decomming. HDFS-11499 attempted a partial fix the problem. About
2.5yr I internally patched symptoms of the problem in a more complete way, but
based on what I understand now is not entirely insufficient.
> Lease monitor may infinitely loop on the same lease
> ---------------------------------------------------
>
> Key: HDFS-12747
> URL: https://issues.apache.org/jira/browse/HDFS-12747
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.8.0
> Reporter: Daryn Sharp
> Priority: Critical
>
> Lease recovery incorrectly handles UC files if the last block is complete but
> the penultimate block is committed. Incorrectly handles is the euphemism for
> infinitely loops for days and leaves all abandoned streams open until
> customers complain.
> The problem may manifest when:
> # Block1 is committed but seemingly never completed
> # Block2 is allocated
> # Lease recovery is initiated for block2
> # Commit block synchronization invokes {{FSNamesytem#closeFileCommitBlocks}},
> causing:
> #* {{commitOrCompleteLastBlock}} to mark block2 as complete
> #*
> {{finalizeINodeFileUnderConstruction}}/{{INodeFile.assertAllBlocksComplete}}
> to throw {{IllegalStateException}} because the penultimate block1 is
> "COMMITTED but not COMPLETE"
> # The next lease recovery results in an infinite loop.
> The {{LeaseManager}} expects that {{FSNamesystem#internalReleaseLease}} will
> either init recovery and renew the lease, or remove the lease. In the
> described state it does neither. The switch case will break out if the last
> block is complete. (The case statement ironically contains an assert).
> Since nothing changed, the lease is still the “next” lease to be processed.
> The lease monitor loops for 25ms on the same lease, sleeps for 2s, loops on
> it again.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]