[jira] [Comment Edited] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor

Kihwal Lee (JIRA) Thu, 12 May 2016 08:17:38 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281592#comment-15281592
 ]


Kihwal Lee edited comment on HDFS-10220 at 5/12/16 3:16 PM:
------------------------------------------------------------

The throughput is pathetic, but it seems in the ballpark of what I have seen. 
In my experience, the {{commitBlockSynchronization()}} load generated by lease 
recovery also affects performance greatly.  The lease recovery may fill up the 
edit buffer and cause auto-sync. Depending on the speed of edit syncing, a 
massive lease recovery can overwhelm the edit buffering and I/O.  It will be 
nice if find out what the actual bottleneck is, so that we can improve the 
performance.

The average rpc time of {{commitBlockSynchronization()}} I observed lately is 
around 600us.  After releasing 1K paths, there can be 1K  
{{commitBlockSynchronization()}} calls in the worst case. That will translate 
to 600ms, so cumulatively about 800ms will be spent in the namespace write 
lock. Since the lease manager sleeps for 2 seconds, the NN will be spending 
about 0.8/2.2 = 36% of time exclusively on lease/block recovery.

This might only be acceptable to lightly loaded namenodes. Setting the limit to 
100ms will lower it to 24% and 50ms will make it 12%.  We also have to weigh in 
how important these lease recoveries are.  I don't think this kind of mass 
lease recoveries are normal. These are usually caused by faulty user code (e.g. 
not closing files before committing).  This should not penalize other users by 
greatly degrading NN performance.  So I lean toward something like 50ms or 
shorter. 

I want to hear what others think.


was (Author: kihwal):
The throughput is pathetic, but it seems in the ballpark of what I have seen. 
In my experience, the {{commitBlockSynchronization()}} load generated by lease 
recovery also affects performance greatly.  The lease recovery may fill up the 
edit buffer and cause auto-sync. Depending on the speed of edit syncing, a 
massive lease recovery can overwhelm the edit buffering and I/O.  It will be 
nice if find out what the actual bottleneck is, so that we can improve the 
performance.

The average rpc time of {{commitBlockSynchronization()}} I observed lately is 
around 600us.  After releasing 1K paths, there can be 1K  
{{commitBlockSynchronization()}} calls in the worst case. That will translate 
to 600ms, so overall about 800ms will be spent in the namespace write lock. 
Since the lease manager sleeps for 2 seconds, the NN will be spending about 
0.8/2.2 = 36% of time exclusively on lease/block recovery.

This might only be acceptable to lightly loaded namenodes. Setting the limit to 
100ms will lower it to 24% and 50ms will make it 12%.  We also have to weigh in 
how important these lease recoveries are.  I don't think this kind of mass 
lease recoveries are normal. These are usually caused by faulty user code (e.g. 
not closing files before committing).  This should not penalize other users by 
greatly degrading NN performance.  So I lean toward something like 50ms or 
shorter. 

I want to hear what others think.

> Namenode failover due to too long loking in LeaseManager.Monitor
> ----------------------------------------------------------------
>
>                 Key: HDFS-10220
>                 URL: https://issues.apache.org/jira/browse/HDFS-10220
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Nicolas Fraison
>            Assignee: Nicolas Fraison
>            Priority: Minor
>         Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, 
> HADOOP-10220.003.patch, HADOOP-10220.004.patch, HADOOP-10220.005.patch, 
> HADOOP-10220.006.patch, threaddump_zkfc.txt
>
>
> I have faced a namenode failover due to unresponsive namenode detected by the 
> zkfc with lot's of WARN messages (5 millions) like this one:
> _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All 
> existing blocks are COMPLETE, lease removed, file closed._
> On the threaddump taken by the zkfc there are lots of thread blocked due to a 
> lock.
> Looking at the code, there are a lock taken by the LeaseManager.Monitor when 
> some lease must be released. Due to the really big number of lease to be 
> released the namenode has taken too many times to release them blocking all 
> other tasks and making the zkfc thinking that the namenode was not 
> available/stuck.
> The idea of this patch is to limit the number of leased released each time we 
> check for lease so the lock won't be taken for a too long time period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor

Reply via email to