[ 
https://issues.apache.org/jira/browse/KAFKA-7364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Yang updated KAFKA-7364:
---------------------------
    Description: 
while testing ssl writing to kafka, we found that kafka often run into high cpu 
usage due to inefficiency in jdk ssl implementation. 

In detail, we use a test cluster that have 12 d2.8xlarge instances, jdk-10.0.2, 
 and hosts only one topic that have ~20k producers write to through ssl 
channel. We observed that  the network threads often get 100% cpu usage after 
enabling ssl writing to kafka.   To improve kafka's throughput, we have 
"num.network.threads=32" for the broker.  Even with 32 network threads, we see 
the broker cpu usage jump right after ssl writing is enabled.  The broker's cpu 
usage would drop immediately when we disabled ssl writing. 

 !Screen Shot 2018-08-30 at 10.57.32 PM.png|height=360! 

When the broker's cpu usage is high, 'perf top' shows that kafka is busy with 
executing code in  libsunec.so.  The following is a sample stack track that we 
get when the broker's cpu usage was high. 

{code}
Thread 77562: (state = IN_NATIVE)
 - sun.security.ec.ECDSASignature.verifySignedDigest(byte[], byte[], byte[], 
byte[]) @bci=0 (Compiled frame; information may be imprecise)
 - sun.security.ec.ECDSASignature.engineVerify(byte[]) @bci=70, line=321 
(Compiled frame)
 - java.security.Signature$Delegate.engineVerify(byte[]) @bci=9, line=1222 
(Compiled frame)
 - java.security.Signature.verify(byte[]) @bci=10, line=655 (Compiled frame)
 - sun.security.x509.X509CertImpl.verify(java.security.PublicKey, 
java.lang.String) @bci=136, line=444 (Compiled frame)
 - 
sun.security.provider.certpath.BasicChecker.verifySignature(java.security.cert.X509Certificate)
 @bci=48, line=166 (Compiled frame)
 - 
sun.security.provider.certpath.BasicChecker.check(java.security.cert.Certificate,
 java.util.Collection) @bci=24, line=147 (Compiled frame)
 - 
sun.security.provider.certpath.PKIXMasterCertPathValidator.validate(java.security.cert.CertPath,
 java.util.List, java.util.List) @bci=316, line=125 (Compiled frame)
 - 
sun.security.provider.certpath.PKIXCertPathValidator.validate(java.security.cert.TrustAnchor,
 sun.security.provider.certpath.PKIX$ValidatorParams) @bci=390, line=233 
(Compiled frame)
 - 
sun.security.provider.certpath.PKIXCertPathValidator.validate(sun.security.provider.certpath.PKIX$ValidatorParams)
 @bci=217, line=141 (Compiled frame)
 - 
sun.security.provider.certpath.PKIXCertPathValidator.engineValidate(java.security.cert.CertPath,
 java.security.cert.CertPathParameters) @bci=7, line=80 (Compiled frame)
 - java.security.cert.CertPathValidator.validate(java.security.cert.CertPath, 
java.security.cert.CertPathParameters) @bci=6, line=292 (Compiled frame)
 - 
sun.security.validator.PKIXValidator.doValidate(java.security.cert.X509Certificate[],
 java.security.cert.PKIXBuilderParameters) @bci=34, line=357 (Compiled frame)
 - 
sun.security.validator.PKIXValidator.engineValidate(java.security.cert.X509Certificate[],
 java.util.Collection, java.security.AlgorithmConstraints, java.lang.Object) 
@bci=232, line=259 (Compiled frame)
 - 
sun.security.validator.Validator.validate(java.security.cert.X509Certificate[], 
java.util.Collection, java.security.AlgorithmConstraints, java.lang.Object) 
@bci=6, line=260 (Compiled frame)
 - 
sun.security.ssl.X509TrustManagerImpl.validate(sun.security.validator.Validator,
 java.security.cert.X509Certificate[], java.security.AlgorithmConstraints, 
java.lang.String) @bci=10, line=324 (Compiled frame)
 - 
sun.security.ssl.X509TrustManagerImpl.checkTrusted(java.security.cert.X509Certificate[],
 java.lang.String, javax.net.ssl.SSLEngine, boolean) @bci=179, line=279 
(Compiled frame)
 - 
sun.security.ssl.X509TrustManagerImpl.checkClientTrusted(java.security.cert.X509Certificate[],
 java.lang.String, javax.net.ssl.SSLEngine) @bci=5, line=130 (Compiled frame)
 - 
sun.security.ssl.ServerHandshaker.clientCertificate(sun.security.ssl.HandshakeMessage$CertificateMsg)
 @bci=190, line=1966 (Compiled frame)
 - sun.security.ssl.ServerHandshaker.processMessage(byte, int) @bci=160, 
line=237 (Compiled frame)
 - sun.security.ssl.Handshaker.processLoop() @bci=96, line=1052 (Compiled frame)
 - sun.security.ssl.Handshaker$1.run() @bci=4, line=992 (Compiled frame)
 - sun.security.ssl.Handshaker$1.run() @bci=1, line=989 (Compiled frame)
 - 
java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
 java.security.AccessControlContext) @bci=0 (Compiled frame)
 - sun.security.ssl.Handshaker$DelegatedTask.run() @bci=24, line=1467 (Compiled 
frame)
 - org.apache.kafka.common.network.SslTransportLayer.runDelegatedTasks() 
@bci=13, line=393 (Compiled frame)
 - org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(boolean) 
@bci=88, line=473 (Compiled frame)
 - org.apache.kafka.common.network.SslTransportLayer.doHandshake() @bci=570, 
line=331 (Compiled frame)
 - org.apache.kafka.common.network.SslTransportLayer.handshake() @bci=48, 
line=258 (Compiled frame)
 - org.apache.kafka.common.network.KafkaChannel.prepare() @bci=16, line=125 
(Compiled frame)
 - org.apache.kafka.common.network.Selector.pollSelectionKeys(java.util.Set, 
boolean, long) @bci=251, line=494 (Compiled frame)
 - org.apache.kafka.common.network.Selector.poll(long) @bci=312, line=432 
(Compiled frame)
 - kafka.network.Processor.poll() @bci=7, line=678 (Compiled frame)
 - kafka.network.Processor.run() @bci=31, line=583 (Compiled frame)
 - java.lang.Thread.run() @bci=11, line=748 (Interpreted frame)
{code}



  was:
while testing ssl writing to kafka, we found that kafka often run into high cpu 
usage due to inefficiency in jdk ssl implementation. 

In detail, we use a test cluster that have 12 d2.8xlarge instances, jdk-10.0.2, 
 and hosts only one topic that have ~20k producers write to through ssl 
channel. We observed that  the network threads often get 100% cpu usage after 
enabling ssl writing to kafka.   To improve kafka's throughput, we have 
"num.network.threads=32" for the broker.  Even with 32 network threads, we see 
the broker cpu usage jump right after ssl writing is enabled.  The broker's cpu 
usage would drop immediately when we disabled ssl writing. 

 !Screen Shot 2018-08-30 at 10.57.32 PM.png! 

When the broker's cpu usage is high, 'perf top' shows that kafka is busy with 
executing code in  libsunec.so.  The following is a sample stack track that we 
get when the broker's cpu usage was high. 

{code}
Thread 77562: (state = IN_NATIVE)
 - sun.security.ec.ECDSASignature.verifySignedDigest(byte[], byte[], byte[], 
byte[]) @bci=0 (Compiled frame; information may be imprecise)
 - sun.security.ec.ECDSASignature.engineVerify(byte[]) @bci=70, line=321 
(Compiled frame)
 - java.security.Signature$Delegate.engineVerify(byte[]) @bci=9, line=1222 
(Compiled frame)
 - java.security.Signature.verify(byte[]) @bci=10, line=655 (Compiled frame)
 - sun.security.x509.X509CertImpl.verify(java.security.PublicKey, 
java.lang.String) @bci=136, line=444 (Compiled frame)
 - 
sun.security.provider.certpath.BasicChecker.verifySignature(java.security.cert.X509Certificate)
 @bci=48, line=166 (Compiled frame)
 - 
sun.security.provider.certpath.BasicChecker.check(java.security.cert.Certificate,
 java.util.Collection) @bci=24, line=147 (Compiled frame)
 - 
sun.security.provider.certpath.PKIXMasterCertPathValidator.validate(java.security.cert.CertPath,
 java.util.List, java.util.List) @bci=316, line=125 (Compiled frame)
 - 
sun.security.provider.certpath.PKIXCertPathValidator.validate(java.security.cert.TrustAnchor,
 sun.security.provider.certpath.PKIX$ValidatorParams) @bci=390, line=233 
(Compiled frame)
 - 
sun.security.provider.certpath.PKIXCertPathValidator.validate(sun.security.provider.certpath.PKIX$ValidatorParams)
 @bci=217, line=141 (Compiled frame)
 - 
sun.security.provider.certpath.PKIXCertPathValidator.engineValidate(java.security.cert.CertPath,
 java.security.cert.CertPathParameters) @bci=7, line=80 (Compiled frame)
 - java.security.cert.CertPathValidator.validate(java.security.cert.CertPath, 
java.security.cert.CertPathParameters) @bci=6, line=292 (Compiled frame)
 - 
sun.security.validator.PKIXValidator.doValidate(java.security.cert.X509Certificate[],
 java.security.cert.PKIXBuilderParameters) @bci=34, line=357 (Compiled frame)
 - 
sun.security.validator.PKIXValidator.engineValidate(java.security.cert.X509Certificate[],
 java.util.Collection, java.security.AlgorithmConstraints, java.lang.Object) 
@bci=232, line=259 (Compiled frame)
 - 
sun.security.validator.Validator.validate(java.security.cert.X509Certificate[], 
java.util.Collection, java.security.AlgorithmConstraints, java.lang.Object) 
@bci=6, line=260 (Compiled frame)
 - 
sun.security.ssl.X509TrustManagerImpl.validate(sun.security.validator.Validator,
 java.security.cert.X509Certificate[], java.security.AlgorithmConstraints, 
java.lang.String) @bci=10, line=324 (Compiled frame)
 - 
sun.security.ssl.X509TrustManagerImpl.checkTrusted(java.security.cert.X509Certificate[],
 java.lang.String, javax.net.ssl.SSLEngine, boolean) @bci=179, line=279 
(Compiled frame)
 - 
sun.security.ssl.X509TrustManagerImpl.checkClientTrusted(java.security.cert.X509Certificate[],
 java.lang.String, javax.net.ssl.SSLEngine) @bci=5, line=130 (Compiled frame)
 - 
sun.security.ssl.ServerHandshaker.clientCertificate(sun.security.ssl.HandshakeMessage$CertificateMsg)
 @bci=190, line=1966 (Compiled frame)
 - sun.security.ssl.ServerHandshaker.processMessage(byte, int) @bci=160, 
line=237 (Compiled frame)
 - sun.security.ssl.Handshaker.processLoop() @bci=96, line=1052 (Compiled frame)
 - sun.security.ssl.Handshaker$1.run() @bci=4, line=992 (Compiled frame)
 - sun.security.ssl.Handshaker$1.run() @bci=1, line=989 (Compiled frame)
 - 
java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
 java.security.AccessControlContext) @bci=0 (Compiled frame)
 - sun.security.ssl.Handshaker$DelegatedTask.run() @bci=24, line=1467 (Compiled 
frame)
 - org.apache.kafka.common.network.SslTransportLayer.runDelegatedTasks() 
@bci=13, line=393 (Compiled frame)
 - org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(boolean) 
@bci=88, line=473 (Compiled frame)
 - org.apache.kafka.common.network.SslTransportLayer.doHandshake() @bci=570, 
line=331 (Compiled frame)
 - org.apache.kafka.common.network.SslTransportLayer.handshake() @bci=48, 
line=258 (Compiled frame)
 - org.apache.kafka.common.network.KafkaChannel.prepare() @bci=16, line=125 
(Compiled frame)
 - org.apache.kafka.common.network.Selector.pollSelectionKeys(java.util.Set, 
boolean, long) @bci=251, line=494 (Compiled frame)
 - org.apache.kafka.common.network.Selector.poll(long) @bci=312, line=432 
(Compiled frame)
 - kafka.network.Processor.poll() @bci=7, line=678 (Compiled frame)
 - kafka.network.Processor.run() @bci=31, line=583 (Compiled frame)
 - java.lang.Thread.run() @bci=11, line=748 (Interpreted frame)
{code}




> kafka periodically run into high cpu usage with ssl writing
> -----------------------------------------------------------
>
>                 Key: KAFKA-7364
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7364
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 2.0.0
>            Reporter: Yu Yang
>            Priority: Major
>         Attachments: Screen Shot 2018-08-30 at 10.57.32 PM.png
>
>
> while testing ssl writing to kafka, we found that kafka often run into high 
> cpu usage due to inefficiency in jdk ssl implementation. 
> In detail, we use a test cluster that have 12 d2.8xlarge instances, 
> jdk-10.0.2,  and hosts only one topic that have ~20k producers write to 
> through ssl channel. We observed that  the network threads often get 100% cpu 
> usage after enabling ssl writing to kafka.   To improve kafka's throughput, 
> we have "num.network.threads=32" for the broker.  Even with 32 network 
> threads, we see the broker cpu usage jump right after ssl writing is enabled. 
>  The broker's cpu usage would drop immediately when we disabled ssl writing. 
>  !Screen Shot 2018-08-30 at 10.57.32 PM.png|height=360! 
> When the broker's cpu usage is high, 'perf top' shows that kafka is busy with 
> executing code in  libsunec.so.  The following is a sample stack track that 
> we get when the broker's cpu usage was high. 
> {code}
> Thread 77562: (state = IN_NATIVE)
>  - sun.security.ec.ECDSASignature.verifySignedDigest(byte[], byte[], byte[], 
> byte[]) @bci=0 (Compiled frame; information may be imprecise)
>  - sun.security.ec.ECDSASignature.engineVerify(byte[]) @bci=70, line=321 
> (Compiled frame)
>  - java.security.Signature$Delegate.engineVerify(byte[]) @bci=9, line=1222 
> (Compiled frame)
>  - java.security.Signature.verify(byte[]) @bci=10, line=655 (Compiled frame)
>  - sun.security.x509.X509CertImpl.verify(java.security.PublicKey, 
> java.lang.String) @bci=136, line=444 (Compiled frame)
>  - 
> sun.security.provider.certpath.BasicChecker.verifySignature(java.security.cert.X509Certificate)
>  @bci=48, line=166 (Compiled frame)
>  - 
> sun.security.provider.certpath.BasicChecker.check(java.security.cert.Certificate,
>  java.util.Collection) @bci=24, line=147 (Compiled frame)
>  - 
> sun.security.provider.certpath.PKIXMasterCertPathValidator.validate(java.security.cert.CertPath,
>  java.util.List, java.util.List) @bci=316, line=125 (Compiled frame)
>  - 
> sun.security.provider.certpath.PKIXCertPathValidator.validate(java.security.cert.TrustAnchor,
>  sun.security.provider.certpath.PKIX$ValidatorParams) @bci=390, line=233 
> (Compiled frame)
>  - 
> sun.security.provider.certpath.PKIXCertPathValidator.validate(sun.security.provider.certpath.PKIX$ValidatorParams)
>  @bci=217, line=141 (Compiled frame)
>  - 
> sun.security.provider.certpath.PKIXCertPathValidator.engineValidate(java.security.cert.CertPath,
>  java.security.cert.CertPathParameters) @bci=7, line=80 (Compiled frame)
>  - java.security.cert.CertPathValidator.validate(java.security.cert.CertPath, 
> java.security.cert.CertPathParameters) @bci=6, line=292 (Compiled frame)
>  - 
> sun.security.validator.PKIXValidator.doValidate(java.security.cert.X509Certificate[],
>  java.security.cert.PKIXBuilderParameters) @bci=34, line=357 (Compiled frame)
>  - 
> sun.security.validator.PKIXValidator.engineValidate(java.security.cert.X509Certificate[],
>  java.util.Collection, java.security.AlgorithmConstraints, java.lang.Object) 
> @bci=232, line=259 (Compiled frame)
>  - 
> sun.security.validator.Validator.validate(java.security.cert.X509Certificate[],
>  java.util.Collection, java.security.AlgorithmConstraints, java.lang.Object) 
> @bci=6, line=260 (Compiled frame)
>  - 
> sun.security.ssl.X509TrustManagerImpl.validate(sun.security.validator.Validator,
>  java.security.cert.X509Certificate[], java.security.AlgorithmConstraints, 
> java.lang.String) @bci=10, line=324 (Compiled frame)
>  - 
> sun.security.ssl.X509TrustManagerImpl.checkTrusted(java.security.cert.X509Certificate[],
>  java.lang.String, javax.net.ssl.SSLEngine, boolean) @bci=179, line=279 
> (Compiled frame)
>  - 
> sun.security.ssl.X509TrustManagerImpl.checkClientTrusted(java.security.cert.X509Certificate[],
>  java.lang.String, javax.net.ssl.SSLEngine) @bci=5, line=130 (Compiled frame)
>  - 
> sun.security.ssl.ServerHandshaker.clientCertificate(sun.security.ssl.HandshakeMessage$CertificateMsg)
>  @bci=190, line=1966 (Compiled frame)
>  - sun.security.ssl.ServerHandshaker.processMessage(byte, int) @bci=160, 
> line=237 (Compiled frame)
>  - sun.security.ssl.Handshaker.processLoop() @bci=96, line=1052 (Compiled 
> frame)
>  - sun.security.ssl.Handshaker$1.run() @bci=4, line=992 (Compiled frame)
>  - sun.security.ssl.Handshaker$1.run() @bci=1, line=989 (Compiled frame)
>  - 
> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
>  java.security.AccessControlContext) @bci=0 (Compiled frame)
>  - sun.security.ssl.Handshaker$DelegatedTask.run() @bci=24, line=1467 
> (Compiled frame)
>  - org.apache.kafka.common.network.SslTransportLayer.runDelegatedTasks() 
> @bci=13, line=393 (Compiled frame)
>  - org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(boolean) 
> @bci=88, line=473 (Compiled frame)
>  - org.apache.kafka.common.network.SslTransportLayer.doHandshake() @bci=570, 
> line=331 (Compiled frame)
>  - org.apache.kafka.common.network.SslTransportLayer.handshake() @bci=48, 
> line=258 (Compiled frame)
>  - org.apache.kafka.common.network.KafkaChannel.prepare() @bci=16, line=125 
> (Compiled frame)
>  - org.apache.kafka.common.network.Selector.pollSelectionKeys(java.util.Set, 
> boolean, long) @bci=251, line=494 (Compiled frame)
>  - org.apache.kafka.common.network.Selector.poll(long) @bci=312, line=432 
> (Compiled frame)
>  - kafka.network.Processor.poll() @bci=7, line=678 (Compiled frame)
>  - kafka.network.Processor.run() @bci=31, line=583 (Compiled frame)
>  - java.lang.Thread.run() @bci=11, line=748 (Interpreted frame)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to