[
https://issues.apache.org/jira/browse/HDFS-12747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226860#comment-16226860
]
Daryn Sharp commented on HDFS-12747:
------------------------------------
I've been unable to determine how/why the received IBR for block1 was "lost".
I suspected a race condition with {{updatePipeline}}, which did occur for
block1, which sparked Kihwal into remembering HDFS-11445 which describes the
same race condition. HDFS-11755 was believed to inadvertently fix the issue
but it apparently does not.
> Lease monitor may infinitely loop on the same lease
> ---------------------------------------------------
>
> Key: HDFS-12747
> URL: https://issues.apache.org/jira/browse/HDFS-12747
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.8.0
> Reporter: Daryn Sharp
> Priority: Critical
>
> Lease recovery incorrectly handles UC files if the last block is complete but
> the penultimate block is committed. Incorrectly handles is the euphemism for
> infinitely loops for days and leaves all abandoned streams open until
> customers complain.
> The problem may manifest when:
> # Block1 is committed but seemingly never completed
> # Block2 is allocated
> # Lease recovery is initiated for block2
> # Commit block synchronization invokes {{FSNamesytem#closeFileCommitBlocks}},
> causing:
> #* {{commitOrCompleteLastBlock}} to mark block2 as complete
> #*
> {{finalizeINodeFileUnderConstruction}}/{{INodeFile.assertAllBlocksComplete}}
> to throw {{IllegalStateException}} because the penultimate block1 is
> "COMMITTED but not COMPLETE"
> # The next lease recovery results in an infinite loop.
> The {{LeaseManager}} expects that {{FSNamesystem#internalReleaseLease}} will
> either init recovery and renew the lease, or remove the lease. In the
> described state it does neither. The switch case will break out if the last
> block is complete. (The case statement ironically contains an assert).
> Since nothing changed, the lease is still the “next” lease to be processed.
> The lease monitor loops for 25ms on the same lease, sleeps for 2s, loops on
> it again.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]