[
https://issues.apache.org/jira/browse/HADOOP-15696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604693#comment-16604693
]
Hudson commented on HADOOP-15696:
---------------------------------
SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14882 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/14882/])
HADOOP-15696. KMS performance regression due to too many open file (weichiu:
rev e780556ae9229fe7a90817eb4e5449d7eed35dd8)
* (edit)
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/http/TestHttpServer.java
* (edit)
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/http/HttpServer2.java
* (edit)
hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/resources/httpfs-default.xml
* (edit) hadoop-common-project/hadoop-kms/src/main/resources/kms-default.xml
> KMS performance regression due to too many open file descriptors after Jetty
> migration
> --------------------------------------------------------------------------------------
>
> Key: HADOOP-15696
> URL: https://issues.apache.org/jira/browse/HADOOP-15696
> Project: Hadoop Common
> Issue Type: Bug
> Components: kms
> Affects Versions: 3.0.0-alpha2
> Reporter: Wei-Chiu Chuang
> Assignee: Wei-Chiu Chuang
> Priority: Blocker
> Fix For: 3.2.0
>
> Attachments: HADOOP-15696.001.patch, HADOOP-15696.002.patch,
> HADOOP-15696.003.patch, HADOOP-15696.branch-3.1.001.patch, Screen Shot
> 2018-08-22 at 11.36.16 AM.png, Screen Shot 2018-08-22 at 4.26.51 PM.png,
> Screen Shot 2018-08-22 at 4.26.51 PM.png, Screen Shot 2018-08-22 at 4.27.02
> PM.png, Screen Shot 2018-08-22 at 4.30.32 PM.png, Screen Shot 2018-08-22 at
> 4.30.39 PM.png, Screen Shot 2018-08-24 at 7.08.16 PM.png
>
>
> We recently found KMS performance regressed in Hadoop 3.0, possibly linking
> to the migration from Tomcat to Jetty in HADOOP-13597.
> Symptoms:
> # Hadoop 3.x KMS open file descriptors quickly rises to more than 10 thousand
> under stress, sometimes even exceeds 32K, which is the system limit, causing
> failures for any access to encryption zones. Our internal testing shows the
> openfd number was in the range of a few hundred in Hadoop 2.x, and it
> increases by almost 100x in Hadoop 3.
> # Hadoop 3.x KMS as much as twice the heap size than in Hadoop 2.x. The same
> heap size can go OOM in Hadoop 3.x. Jxray analysis suggests most of them are
> temporary byte arrays associated with open SSL connections.
> # Due to the heap usage, Hadoop 3.x KMS has more frequent GC activities, and
> we observed up to 20% performance reduction due to GC.
> A possible solution is to reduce the idle timeout setting in HttpServer2. It
> is currently hard-coded 10 seconds. By setting it to 1 second, open fds
> dropped from 20 thousand down to 3 thousand in my experiment.
> File this jira to invite open discussion for a solution.
> Credit: [[email protected]] for the proposed Jetty idle timeout remedy;
> [~xiaochen] for digging into this problem.
> Screenshots:
> CDH5 (Hadoop 2) KMS CPU utilization, resident memory and file descriptor
> chart.
> !Screen Shot 2018-08-22 at 4.30.39 PM.png!
> CDH6 (Hadoop 3) KMS CPU utilization, resident memory and file descriptor
> chart.
> !Screen Shot 2018-08-22 at 4.30.32 PM.png!
> CDH5 (Hadoop 2) GC activities on the KMS process
> !Screen Shot 2018-08-22 at 4.26.51 PM.png!
> CDH6 (Hadoop 3) GC activities on the KMS process
> !Screen Shot 2018-08-22 at 4.27.02 PM.png!
> JXray report
> !Screen Shot 2018-08-22 at 11.36.16 AM.png!
> open fd drops from 20 k down to 3k after the proposed change.
> !Screen Shot 2018-08-24 at 7.08.16 PM.png!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]