James Brown created KAFKA-9339:
----------------------------------
Summary: Increased CPU utilization in brokers in 2.4.0
Key: KAFKA-9339
URL: https://issues.apache.org/jira/browse/KAFKA-9339
Project: Kafka
Issue Type: Bug
Affects Versions: 2.4.0
Environment: CentOS 6; Java 1.8.0_232 (OpenJDK)
Reporter: James Brown
I upgraded one of my company's test clusters from 2.3.1 to 2.4.0 and have
noticed a significant (40%) increase in the CPU time consumed. This is a small
cluster of three nodes (running on t2.large EC2 instances all in the same AZ)
pushing about 150 message/s in aggregate spread across 208 topics (a total of
266 partitions; most topics only have one partition). Leadership is reasonably
well-distributed and each node has between 83 and 94 partitions which it leads.
This CPU time increase is visible symmetrically on all three nodes in the
cluster (e.g., the controller isn't using more CPU than the other nodes).
The CPU consumption did not return to normal after I did the second restart to
bump the log and inter-broker protocol versions to 2.4, so I don't think it has
anything to do with down-converting to the 2.3 protocols.
No settings were changed, nor was anything about the JVM changed. There is
nothing interesting being written to the logs. There's no sign of any
instability (partitions aren't being reassigned, etc).
The best guess I have for the increased CPU usage is that the number of garbage
collections increased by approximately 30%, suggesting that something is
churning a lot more garbage inside Kafka. This is a small cluster, so it's only
got a 3GB heap allocated to Kafka on each node; we're using G1GC with some
light tuning and are on Java 8 if that helps.
We are only using OpenJDK, so I don't think I can produce a Flight Recorder
profile.
The kafka-users mailing list suggested this was worth filing a Jira issue about.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)