[
https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15270026#comment-15270026
]
Mingliang Liu commented on HDFS-10220:
--------------------------------------
I also think changing from {{Lease leaseToCheck = sortedLeases.poll();}} to
{{Lease leaseToCheck = sortedLeases.peek();}} will address [~walter.k.su]'s
comment. Moreover, we can move the statements in {{finally}} block out of it
(instead, put them after the try-catch). I'm not favor of "breaking" a
upper-level loop in the {{finally}} block and I was hinted by
[ERR04-J.|https://www.securecoding.cert.org/confluence/display/java/ERR04-J.+Do+not+complete+abruptly+from+a+finally+block].
Other than this, I have some nits:
# {{isMaxLockHoldToReleaseLease}} can be private
# In the test, according to {{assertEquals(expected, actual)}} signature, we
need reduce confusing test failing message.
{code:java}
- assertEquals(lm.countLease(), numLease);
+ assertEquals(numLease, lm.countLease());
{code}
# We may still need the javadoc for {{MAX_LOCK_HOLD_TO_RELEASE_LAESE_MS}}
> Namenode failover due to too long loking in LeaseManager.Monitor
> ----------------------------------------------------------------
>
> Key: HDFS-10220
> URL: https://issues.apache.org/jira/browse/HDFS-10220
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Reporter: Nicolas Fraison
> Assignee: Nicolas Fraison
> Priority: Minor
> Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch,
> HADOOP-10220.003.patch, HADOOP-10220.004.patch, HADOOP-10220.005.patch,
> threaddump_zkfc.txt
>
>
> I have faced a namenode failover due to unresponsive namenode detected by the
> zkfc with lot's of WARN messages (5 millions) like this one:
> _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All
> existing blocks are COMPLETE, lease removed, file closed._
> On the threaddump taken by the zkfc there are lots of thread blocked due to a
> lock.
> Looking at the code, there are a lock taken by the LeaseManager.Monitor when
> some lease must be released. Due to the really big number of lease to be
> released the namenode has taken too many times to release them blocking all
> other tasks and making the zkfc thinking that the namenode was not
> available/stuck.
> The idea of this patch is to limit the number of leased released each time we
> check for lease so the lock won't be taken for a too long time period.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]