Daryn Sharp created HDFS-12747:
----------------------------------
Summary: Lease monitor may infinitely loop on the same lease
Key: HDFS-12747
URL: https://issues.apache.org/jira/browse/HDFS-12747
Project: Hadoop HDFS
Issue Type: Bug
Components: namenode
Affects Versions: 2.8.0
Reporter: Daryn Sharp
Priority: Critical
Lease recovery incorrectly handles UC files if the last block is complete but
the penultimate block is committed. Incorrectly handles is the euphemism for
infinitely loops for days and leaves all abandoned streams open until customers
complain.
The problem may manifest when:
# Block1 is committed but seemingly never committed
# Block2 is allocated
# Lease recovery is initiated for block2
# Commit block synchronization invokes {{FSNamesytem#closeFileCommitBlocks}},
causing:
#* {{commitOrCompleteLastBlock}} to mark block2 as complete
#* {{finalizeINodeFileUnderConstruction}}/{{INodeFile.assertAllBlocksComplete}}
to throw {{IllegalStateException}} because the penultimate block1 is "COMMITTED
but not COMPLETE"
# The next lease recovery results in an infinite loop.
The {{LeaseManager}} expects that {{FSNamesystem#internalReleaseLease}} will
either init recovery and renew the lease, or remove the lease. In the
described state it does neither. The switch case will break out if the last
block is complete. (The case statement ironically contains an assert). Since
nothing changed, the lease is still the “next” lease to be processed. The
lease monitor loops for 25ms on the same lease, sleeps for 2s, loops on it
again.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]