[ https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281592#comment-15281592 ]
Kihwal Lee commented on HDFS-10220: ----------------------------------- The throughput is pathetic, but it seems in the ballpark of what I have seen. In my experience, the {{commitBlockSynchronization()}} load generated by lease recovery also affects performance greatly. The lease recovery may fill up the edit buffer and cause auto-sync. Depending on the speed of edit syncing, a massive lease recovery can overwhelm the edit buffering and I/O. It will be nice if find out what the actual bottleneck is, so that we can improve the performance. The average rpc time of {{commitBlockSynchronization()}} I observed lately is around 600us. After releasing 1K paths, there can be 1K {{commitBlockSynchronization()}} calls in the worst case. That will translate to 600ms, so overall about 800ms will be spent in the namespace write lock. Since the lease manager sleeps for 2 seconds, the NN will be spending about 0.8/2.2 = 36% of time exclusively on lease/block recovery. This might only be acceptable to lightly loaded namenodes. Setting the limit to 100ms will lower it to 24% and 50ms will make it 12%. We also have to weigh in how important these lease recoveries are. I don't think this kind of mass lease recoveries are normal. These are usually caused by faulty user code (e.g. not closing files before committing). This should not penalize other users by greatly degrading NN performance. So I lean toward something like 50ms or shorter. I want to hear what others think. > Namenode failover due to too long loking in LeaseManager.Monitor > ---------------------------------------------------------------- > > Key: HDFS-10220 > URL: https://issues.apache.org/jira/browse/HDFS-10220 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Reporter: Nicolas Fraison > Assignee: Nicolas Fraison > Priority: Minor > Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, > HADOOP-10220.003.patch, HADOOP-10220.004.patch, HADOOP-10220.005.patch, > HADOOP-10220.006.patch, threaddump_zkfc.txt > > > I have faced a namenode failover due to unresponsive namenode detected by the > zkfc with lot's of WARN messages (5 millions) like this one: > _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All > existing blocks are COMPLETE, lease removed, file closed._ > On the threaddump taken by the zkfc there are lots of thread blocked due to a > lock. > Looking at the code, there are a lock taken by the LeaseManager.Monitor when > some lease must be released. Due to the really big number of lease to be > released the namenode has taken too many times to release them blocking all > other tasks and making the zkfc thinking that the namenode was not > available/stuck. > The idea of this patch is to limit the number of leased released each time we > check for lease so the lock won't be taken for a too long time period. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org