James Brown created KAFKA-9339: ---------------------------------- Summary: Increased CPU utilization in brokers in 2.4.0 Key: KAFKA-9339 URL: https://issues.apache.org/jira/browse/KAFKA-9339 Project: Kafka Issue Type: Bug Affects Versions: 2.4.0 Environment: CentOS 6; Java 1.8.0_232 (OpenJDK) Reporter: James Brown
I upgraded one of my company's test clusters from 2.3.1 to 2.4.0 and have noticed a significant (40%) increase in the CPU time consumed. This is a small cluster of three nodes (running on t2.large EC2 instances all in the same AZ) pushing about 150 message/s in aggregate spread across 208 topics (a total of 266 partitions; most topics only have one partition). Leadership is reasonably well-distributed and each node has between 83 and 94 partitions which it leads. This CPU time increase is visible symmetrically on all three nodes in the cluster (e.g., the controller isn't using more CPU than the other nodes). The CPU consumption did not return to normal after I did the second restart to bump the log and inter-broker protocol versions to 2.4, so I don't think it has anything to do with down-converting to the 2.3 protocols. No settings were changed, nor was anything about the JVM changed. There is nothing interesting being written to the logs. There's no sign of any instability (partitions aren't being reassigned, etc). The best guess I have for the increased CPU usage is that the number of garbage collections increased by approximately 30%, suggesting that something is churning a lot more garbage inside Kafka. This is a small cluster, so it's only got a 3GB heap allocated to Kafka on each node; we're using G1GC with some light tuning and are on Java 8 if that helps. We are only using OpenJDK, so I don't think I can produce a Flight Recorder profile. The kafka-users mailing list suggested this was worth filing a Jira issue about. -- This message was sent by Atlassian Jira (v8.3.4#803005)