Wei-Chiu Chuang created HADOOP-15696:
----------------------------------------

             Summary: KMS performance regression due to too many open file 
descriptors after Jetty migration
                 Key: HADOOP-15696
                 URL: https://issues.apache.org/jira/browse/HADOOP-15696
             Project: Hadoop Common
          Issue Type: Bug
          Components: kms
    Affects Versions: 3.0.0
            Reporter: Wei-Chiu Chuang
            Assignee: Wei-Chiu Chuang
         Attachments: Screen Shot 2018-08-22 at 11.36.16 AM.png, Screen Shot 
2018-08-22 at 4.26.51 PM.png, Screen Shot 2018-08-22 at 4.26.51 PM.png, Screen 
Shot 2018-08-22 at 4.27.02 PM.png, Screen Shot 2018-08-22 at 4.30.32 PM.png, 
Screen Shot 2018-08-22 at 4.30.39 PM.png, Screen Shot 2018-08-24 at 7.08.16 
PM.png

We recently found KMS performance regressed in Hadoop 3.0, possibly linking to 
the migration from Tomcat to Jetty in HADOOP-13597.

Symptoms:

# Hadoop 3.x KMS open file descriptors quickly rises to more than 10 thousand 
under stress, sometimes even exceeds 32K, which is the system limit, causing 
failures for any access to encryption zones. Our internal testing shows the 
openfd number was in the range of a few hundred in Hadoop 2.x, and it increases 
by almost 100x in Hadoop 3.
# Hadoop 3.x KMS as much as twice the heap size than in Hadoop 2.x. The same 
heap size can go OOM in Hadoop 3.x. Jxray analysis suggests most of them are 
temporary byte arrays associated with open SSL connections.
# Due to the heap usage, Hadoop 3.x KMS has more frequent GC activities, and we 
observed up to 20% performance reduction due to GC.

A possible solution is to reduce the idle timeout setting in HttpServer2. It is 
currently hard-coded 10 seconds. By setting it to 1 second, open fds dropped 
from 20 thousand down to 3 thousand in my experiment.

File this jira to invite open discussion for a solution.

Credit: [~mi...@cloudera.com] for the proposed Jetty idle timeout remedy; 
[~xiaochen] for digging into this problem.

Screenshots:

CDH5 (Hadoop 2) KMS CPU utilization, resident memory and file descriptor chart.
 !Screen Shot 2018-08-22 at 4.30.39 PM.png! 
CDH6 (Hadoop 3) KMS CPU utilization, resident memory and file descriptor chart.
 !Screen Shot 2018-08-22 at 4.30.32 PM.png! 

CDH5 (Hadoop 2) GC activities on the KMS process
 !Screen Shot 2018-08-22 at 4.26.51 PM.png! 
CDH6 (Hadoop 3) GC activities on the KMS process
 !Screen Shot 2018-08-22 at 4.27.02 PM.png! 

JXray report
 !Screen Shot 2018-08-22 at 11.36.16 AM.png! 

open fd drops from 20 k down to 3k after the proposed change.
 !Screen Shot 2018-08-24 at 7.08.16 PM.png! 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to