[ 
https://issues.apache.org/jira/browse/KAFKA-7304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16633251#comment-16633251
 ] 

Yu Yang edited comment on KAFKA-7304 at 9/30/18 6:20 AM:
---------------------------------------------------------

[~rsivaram] Tested with latest kafka 2.0 branch code, using d2.2x instances, 
16g max heap siz~e for kafka process, and ~30k producers. Using 16gb heap size, 
we did not see frequent gc. But at the same time, we still hit the high cpu 
usage issue that is documented in KAFKA-7364. Did you see high cpu usage 
related issue in your case?

The following is our ssl related kafka setting:
{code:java}
listeners=PLAINTEXT://:9092,SSL://:9093
security.inter.broker.protocol=PLAINTEXT
ssl.client.auth=required
ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1
ssl.endpoint.identification.algorithm=HTTPS
ssl.key.password=key_password
ssl.keystore.location=keystore_location
ssl.keystore.password=keystore_password
ssl.keystore.type=JKS
ssl.secure.random.implementation=SHA1PRNG
ssl.truststore.location=truststore_path
ssl.truststore.password=truststore_password
ssl.truststore.type=JKS
 {code}

The following is the gc chart on a broker using kafka 2.0 binary with commits 
up to 
[https://github.com/apache/kafka/commit/74c8b831472ed07e10ceda660e0e504a6a6821c4]

[http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTgvMDkvMzAvLS1nYy5sb2cuMC5jdXJyZW50Lmd6LS01LTM3LTQ3]

!Screen Shot 2018-09-29 at 10.38.12 PM.png|width=500!

The following is the cpu usage chart of our cluster. The cpu usage jumped to 
almost 100% after enabling TLS-based writing to the cluster. 

!Screen Shot 2018-09-29 at 10.38.38 PM.png|width=500!

There is another issue that we saw with the following setting. See KAFKA-7450 
for details. 
{code}
listeners=PLAINTEXT://:9092,SSL://:9093
security.inter.broker.protocol=SSL
{code}


was (Author: yuyang08):
[~rsivaram] Tested with latest kafka 2.0 branch code, using d2.2x instances, 
16g max heap siz~e for kafka process, and ~30k producers. Using 16gb heap size, 
we did not see frequent gc. But at the same time, we still hit the high cpu 
usage issue that is documented in KAFKA-7364. Did you see high cpu usage 
related issue in your case?

The following is our ssl related kafka setting:
{code:java}
listeners=PLAINTEXT://:9092,SSL://:9093
security.inter.broker.protocol=PLAINTEXT
ssl.client.auth=required
ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1
ssl.endpoint.identification.algorithm=HTTPS
ssl.key.password=key_password
ssl.keystore.location=keystore_location
ssl.keystore.password=keystore_password
ssl.keystore.type=JKS
ssl.secure.random.implementation=SHA1PRNG
ssl.truststore.location=truststore_path
ssl.truststore.password=truststore_password
ssl.truststore.type=JKS
 {code}
The following is the gc chat on a broker with kafka 2.0 changes up to 
[https://github.com/apache/kafka/commit/74c8b831472ed07e10ceda660e0e504a6a6821c4]

[http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTgvMDkvMzAvLS1nYy5sb2cuMC5jdXJyZW50Lmd6LS01LTM3LTQ3]

!Screen Shot 2018-09-29 at 10.38.12 PM.png|width=500!

The following is the cpu usage chart of our cluster. The cpu usage jumped to 
almost 100% after enabling TLS-based writing to the cluster. 

!Screen Shot 2018-09-29 at 10.38.38 PM.png|width=500!

There is another issue that we saw with the following setting. See KAFKA-7450 
for details. 
{code}
listeners=PLAINTEXT://:9092,SSL://:9093
security.inter.broker.protocol=SSL
{code}

> memory leakage in org.apache.kafka.common.network.Selector
> ----------------------------------------------------------
>
>                 Key: KAFKA-7304
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7304
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.1.0, 1.1.1
>            Reporter: Yu Yang
>            Priority: Critical
>             Fix For: 1.1.2, 2.0.1, 2.1.0
>
>         Attachments: 7304.v4.txt, 7304.v7.txt, Screen Shot 2018-08-16 at 
> 11.04.16 PM.png, Screen Shot 2018-08-16 at 11.06.38 PM.png, Screen Shot 
> 2018-08-16 at 12.41.26 PM.png, Screen Shot 2018-08-16 at 4.26.19 PM.png, 
> Screen Shot 2018-08-17 at 1.03.35 AM.png, Screen Shot 2018-08-17 at 1.04.32 
> AM.png, Screen Shot 2018-08-17 at 1.05.30 AM.png, Screen Shot 2018-08-28 at 
> 11.09.45 AM.png, Screen Shot 2018-08-29 at 10.49.03 AM.png, Screen Shot 
> 2018-08-29 at 10.50.47 AM.png, Screen Shot 2018-09-29 at 10.38.12 PM.png, 
> Screen Shot 2018-09-29 at 10.38.38 PM.png, Screen Shot 2018-09-29 at 8.34.50 
> PM.png
>
>
> We are testing secured writing to kafka through ssl. Testing at small scale, 
> ssl writing to kafka was fine. However, when we enabled ssl writing at a 
> larger scale (>40k clients write concurrently), the kafka brokers soon hit 
> OutOfMemory issue with 4G memory setting. We have tried with increasing the 
> heap size to 10Gb, but encountered the same issue. 
> We took a few heap dumps , and found that most of the heap memory is 
> referenced through org.apache.kafka.common.network.Selector objects.  There 
> are two Channel maps field in Selector. It seems that somehow the objects is 
> not deleted from the map in a timely manner. 
> One observation is that the memory leak seems relate to kafka partition 
> leader changes. If there is broker restart etc. in the cluster that caused 
> partition leadership change, the brokers may hit the OOM issue faster. 
> {code}
>     private final Map<String, KafkaChannel> channels;
>     private final Map<String, KafkaChannel> closingChannels;
> {code}
> Please see the  attached images and the following link for sample gc 
> analysis. 
> http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTgvMDgvMTcvLS1nYy5sb2cuMC5jdXJyZW50Lmd6LS0yLTM5LTM0
> the command line for running kafka: 
> {code}
> java -Xms10g -Xmx10g -XX:NewSize=512m -XX:MaxNewSize=512m 
> -Xbootclasspath/p:/usr/local/libs/bcp -XX:MetaspaceSize=128m -XX:+UseG1GC 
> -XX:MaxGCPauseMillis=25 -XX:InitiatingHeapOccupancyPercent=35 
> -XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=25 
> -XX:MaxMetaspaceFreeRatio=75 -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
> -XX:+PrintTenuringDistribution -Xloggc:/var/log/kafka/gc.log 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=40 -XX:GCLogFileSize=50M 
> -Djava.awt.headless=true 
> -Dlog4j.configuration=file:/etc/kafka/log4j.properties 
> -Dcom.sun.management.jmxremote 
> -Dcom.sun.management.jmxremote.authenticate=false 
> -Dcom.sun.management.jmxremote.ssl=false 
> -Dcom.sun.management.jmxremote.port=9999 
> -Dcom.sun.management.jmxremote.rmi.port=9999 -cp /usr/local/libs/*  
> kafka.Kafka /etc/kafka/server.properties
> {code}
> We use java 1.8.0_102, and has applied a TLS patch on reducing 
> X509Factory.certCache map size from 750 to 20. 
> {code}
> java -version
> java version "1.8.0_102"
> Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
> Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to