[jira] [Commented] (HADOOP-15696) KMS performance regression due to too many open file descriptors after Jetty migration

Misha Dmitriev (JIRA) Mon, 27 Aug 2018 17:14:09 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-15696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594339#comment-16594339
 ]


Misha Dmitriev commented on HADOOP-15696:
-----------------------------------------

To be precise, the suggested measure, that had such a big effect, was to adjust 
the timeout in the {{HttpServer2.configureChannelConnector(ServerConnector c)}} 
method. Currently it has the {{c.setIdleTimeout(10000);}} line. This timeout 
should be made configurable in the first place, and looks like we need to 
adjust it to a (much) smaller value when {{HttpServer2}} is used by KMS.

Here is a question that I have in this regard. If by closing HTTP connections 
on the server side, and thus recycling more quickly, we make KMS work better - 
does it mean that the KMS client doesn't reuse any such connections, and/or 
doesn't close a connection when it doesn't need it anymore? If so, it doesn't 
sound very optimal. I wonder how to prove or disprove this theory.

> KMS performance regression due to too many open file descriptors after Jetty 
> migration
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-15696
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15696
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: kms
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>            Priority: Blocker
>         Attachments: Screen Shot 2018-08-22 at 11.36.16 AM.png, Screen Shot 
> 2018-08-22 at 4.26.51 PM.png, Screen Shot 2018-08-22 at 4.26.51 PM.png, 
> Screen Shot 2018-08-22 at 4.27.02 PM.png, Screen Shot 2018-08-22 at 4.30.32 
> PM.png, Screen Shot 2018-08-22 at 4.30.39 PM.png, Screen Shot 2018-08-24 at 
> 7.08.16 PM.png
>
>
> We recently found KMS performance regressed in Hadoop 3.0, possibly linking 
> to the migration from Tomcat to Jetty in HADOOP-13597.
> Symptoms:
> # Hadoop 3.x KMS open file descriptors quickly rises to more than 10 thousand 
> under stress, sometimes even exceeds 32K, which is the system limit, causing 
> failures for any access to encryption zones. Our internal testing shows the 
> openfd number was in the range of a few hundred in Hadoop 2.x, and it 
> increases by almost 100x in Hadoop 3.
> # Hadoop 3.x KMS as much as twice the heap size than in Hadoop 2.x. The same 
> heap size can go OOM in Hadoop 3.x. Jxray analysis suggests most of them are 
> temporary byte arrays associated with open SSL connections.
> # Due to the heap usage, Hadoop 3.x KMS has more frequent GC activities, and 
> we observed up to 20% performance reduction due to GC.
> A possible solution is to reduce the idle timeout setting in HttpServer2. It 
> is currently hard-coded 10 seconds. By setting it to 1 second, open fds 
> dropped from 20 thousand down to 3 thousand in my experiment.
> File this jira to invite open discussion for a solution.
> Credit: [[email protected]] for the proposed Jetty idle timeout remedy; 
> [~xiaochen] for digging into this problem.
> Screenshots:
> CDH5 (Hadoop 2) KMS CPU utilization, resident memory and file descriptor 
> chart.
>  !Screen Shot 2018-08-22 at 4.30.39 PM.png! 
> CDH6 (Hadoop 3) KMS CPU utilization, resident memory and file descriptor 
> chart.
>  !Screen Shot 2018-08-22 at 4.30.32 PM.png! 
> CDH5 (Hadoop 2) GC activities on the KMS process
>  !Screen Shot 2018-08-22 at 4.26.51 PM.png! 
> CDH6 (Hadoop 3) GC activities on the KMS process
>  !Screen Shot 2018-08-22 at 4.27.02 PM.png! 
> JXray report
>  !Screen Shot 2018-08-22 at 11.36.16 AM.png! 
> open fd drops from 20 k down to 3k after the proposed change.
>  !Screen Shot 2018-08-24 at 7.08.16 PM.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-15696) KMS performance regression due to too many open file descriptors after Jetty migration

Reply via email to