[
https://issues.apache.org/jira/browse/KAFKA-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16145563#comment-16145563
]
Apurva Mehta commented on KAFKA-5781:
-------------------------------------
This looks eerily similar to : https://issues.apache.org/jira/browse/KAFKA-4614
Which file system are you using? What do your jvm metrics look like during the
times of the spikes?
> Frequent long produce latency periods that result in reduced produce rate.
> --------------------------------------------------------------------------
>
> Key: KAFKA-5781
> URL: https://issues.apache.org/jira/browse/KAFKA-5781
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 0.11.0.0
> Environment: CentOS Linux release 7.3.1611 , Kernel 3.10, java
> version "1.8.0_121"
> Reporter: Raoufeh Hashemian
> Attachments: frequent_latency_increase_diskactivity.png,
> frequent_latency_increase.png, frequent_latency_increase_zoomed.png
>
>
> When we upgraded from Kafka 0.10,2 to 0.11.0 , I started to see frequent
> throughput drops with a predictable pattern (attached file shows the pattern
> in a 14 hour period). This resulted in an a degradation of up to 30% in our
> overall produce throughput.
> The drops can be correlated to the significant increase in 99th percentile
> latency (up to 4 seconds). We have a cluster of 6 brokers and a single topic.
> The problem happens both with/without consumers running so I only included a
> case without consumers.
> There is no specific message in the broker logs when the latency surge
> happens. However, I found a correlation between the log rotation messages in
> the log and the the longer cycles in the pattern (details shown in the
> attached graph:frequent_latency_increase.png)
> Each increased latency period takes 5 to 20 minutes to finish (shown in the
> zoomed graph in the attached files).
> The broker cpu utilization goes down during this time and some read disk
> activity is observed (see attached graph)
> This pattern started to appear in our environment exactly at the time when we
> switched to kafka 0.11.0. We kept the idempotence as false and didn`t make
> any configuration change as we switched. So I was wondering if it could be a
> bug or configuration that needs to be changed after upgrade?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)