[
https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281592#comment-15281592
]
Kihwal Lee edited comment on HDFS-10220 at 5/12/16 3:16 PM:
------------------------------------------------------------
The throughput is pathetic, but it seems in the ballpark of what I have seen.
In my experience, the {{commitBlockSynchronization()}} load generated by lease
recovery also affects performance greatly. The lease recovery may fill up the
edit buffer and cause auto-sync. Depending on the speed of edit syncing, a
massive lease recovery can overwhelm the edit buffering and I/O. It will be
nice if find out what the actual bottleneck is, so that we can improve the
performance.
The average rpc time of {{commitBlockSynchronization()}} I observed lately is
around 600us. After releasing 1K paths, there can be 1K
{{commitBlockSynchronization()}} calls in the worst case. That will translate
to 600ms, so cumulatively about 800ms will be spent in the namespace write
lock. Since the lease manager sleeps for 2 seconds, the NN will be spending
about 0.8/2.2 = 36% of time exclusively on lease/block recovery.
This might only be acceptable to lightly loaded namenodes. Setting the limit to
100ms will lower it to 24% and 50ms will make it 12%. We also have to weigh in
how important these lease recoveries are. I don't think this kind of mass
lease recoveries are normal. These are usually caused by faulty user code (e.g.
not closing files before committing). This should not penalize other users by
greatly degrading NN performance. So I lean toward something like 50ms or
shorter.
I want to hear what others think.
was (Author: kihwal):
The throughput is pathetic, but it seems in the ballpark of what I have seen.
In my experience, the {{commitBlockSynchronization()}} load generated by lease
recovery also affects performance greatly. The lease recovery may fill up the
edit buffer and cause auto-sync. Depending on the speed of edit syncing, a
massive lease recovery can overwhelm the edit buffering and I/O. It will be
nice if find out what the actual bottleneck is, so that we can improve the
performance.
The average rpc time of {{commitBlockSynchronization()}} I observed lately is
around 600us. After releasing 1K paths, there can be 1K
{{commitBlockSynchronization()}} calls in the worst case. That will translate
to 600ms, so overall about 800ms will be spent in the namespace write lock.
Since the lease manager sleeps for 2 seconds, the NN will be spending about
0.8/2.2 = 36% of time exclusively on lease/block recovery.
This might only be acceptable to lightly loaded namenodes. Setting the limit to
100ms will lower it to 24% and 50ms will make it 12%. We also have to weigh in
how important these lease recoveries are. I don't think this kind of mass
lease recoveries are normal. These are usually caused by faulty user code (e.g.
not closing files before committing). This should not penalize other users by
greatly degrading NN performance. So I lean toward something like 50ms or
shorter.
I want to hear what others think.
> Namenode failover due to too long loking in LeaseManager.Monitor
> ----------------------------------------------------------------
>
> Key: HDFS-10220
> URL: https://issues.apache.org/jira/browse/HDFS-10220
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Reporter: Nicolas Fraison
> Assignee: Nicolas Fraison
> Priority: Minor
> Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch,
> HADOOP-10220.003.patch, HADOOP-10220.004.patch, HADOOP-10220.005.patch,
> HADOOP-10220.006.patch, threaddump_zkfc.txt
>
>
> I have faced a namenode failover due to unresponsive namenode detected by the
> zkfc with lot's of WARN messages (5 millions) like this one:
> _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All
> existing blocks are COMPLETE, lease removed, file closed._
> On the threaddump taken by the zkfc there are lots of thread blocked due to a
> lock.
> Looking at the code, there are a lock taken by the LeaseManager.Monitor when
> some lease must be released. Due to the really big number of lease to be
> released the namenode has taken too many times to release them blocking all
> other tasks and making the zkfc thinking that the namenode was not
> available/stuck.
> The idea of this patch is to limit the number of leased released each time we
> check for lease so the lock won't be taken for a too long time period.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]