[ 
https://issues.apache.org/jira/browse/KAFKA-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16140402#comment-16140402
 ] 

Jun Rao commented on KAFKA-5781:
--------------------------------

The metrics are described in http://kafka.apache.org/documentation/#monitoring. 
It would be useful to get at least the following.

kafka.network:type=RequestMetrics,name=RequestQueueTimeMs,request=Produce
kafka.network:type=RequestMetrics,name=LocalTimeMs,request=Produce
kafka.network:type=RequestMetrics,name=RemoteTimeMs,request=Produce

> Frequent long produce latency periods that result in reduced produce rate.
> --------------------------------------------------------------------------
>
>                 Key: KAFKA-5781
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5781
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.11.0.0
>         Environment: CentOS Linux release 7.3.1611 , Kernel 3.10, java 
> version "1.8.0_121"
>            Reporter: Raoufeh Hashemian
>         Attachments: frequent_latency_increase_diskactivity.png, 
> frequent_latency_increase.png, frequent_latency_increase_zoomed.png
>
>
> When we upgraded from Kafka 0.10,2 to 0.11.0 , I started to see frequent 
> throughput drops with a predictable pattern (attached file shows the pattern 
> in a 14 hour period). This resulted in an a degradation of up to 30% in our 
> overall produce throughput.
> The drops can be correlated to the significant increase in 99th percentile 
> latency (up to 4 seconds). We have a cluster of 6 brokers and a single topic. 
> The problem happens both with/without consumers running so I only included a 
> case without consumers.
> There is no specific message in the broker logs when the latency surge 
> happens.  However, I found a correlation between the log rotation messages in 
> the log and the the longer cycles in the pattern (details shown in the 
> attached graph:frequent_latency_increase.png)
> Each increased latency period takes 5 to 20 minutes to finish (shown in the 
> zoomed graph in the attached files). 
> The broker cpu utilization goes down during this time and some read disk 
> activity is observed (see attached graph)
> This pattern started to appear in our environment exactly at the time when we 
> switched to kafka 0.11.0. We kept the idempotence as false and didn`t make 
> any configuration change as we switched. So I was wondering if it could be a 
> bug or configuration that needs to be changed after upgrade?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to