David Arthur created KAFKA-18921: ------------------------------------ Summary: Low CPU utilization during streams integration tests Key: KAFKA-18921 URL: https://issues.apache.org/jira/browse/KAFKA-18921 Project: Kafka Issue Type: Improvement Components: build, streams Reporter: David Arthur Attachments: image-2025-03-04-17-05-48-768.png
While analyzing some build performance, I noticed that our CPU is not well utilized during the streams integration tests. We should investigate to make sure we're not needlessly waiting during the tests. Here is an example from this build scan https://develocity.apache.org/s/rqe4e2vh5jr72/timeline?page=1&sort=longest !image-2025-03-04-17-05-48-768.png! Notice that the ":streams:integration-tests:test" worker is running longer than the other works and we can see during that time that the CPU is not very utilized. This suggests that the tests are in some idle/waiting state. I tried to reproduce this locally and found at 40 second pause in a random test. [2025-03-04 16:56:11,606] INFO [Consumer clientId=consumer-f2eaa005-b1e1-406f-8c52-e50c71b4d807-58, groupId=f2eaa005-b1e1-406f-8c52-e50c71b4d807] Resetting offset for partition outputTopic_3-0 to position FetchPosition\{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[localhost:59515 (id: 0 rack: null isFenced: false)], epoch=0}}. (org.apache.kafka.clients.consumer.internals.SubscriptionState:447) [2025-03-04 16:56:55,214] INFO [Consumer clientId=testAutoOffsetId-a426e1fb-61ad-4ce8-92ca-f0fe337c2565-StreamThread-1-consumer, groupId=testAutoOffsetId] Successfully joined group with generation Generation\{generationId=4, memberId='testAutoOffsetId-a426e1fb-61ad-4ce8-92ca-f0fe337c2565-StreamThread-1-consumer-2f798603-d538-4041-a90d-2e1b922dd5af', protocol='stream'} (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:666) I grabbed a thread dump: at org.apache.kafka.streams.integration.utils.IntegrationTestUtils.readRecords(IntegrationTestUtils.java:1177) at org.apache.kafka.streams.integration.utils.IntegrationTestUtils.readKeyValues(IntegrationTestUtils.java:1136) at org.apache.kafka.streams.integration.utils.IntegrationTestUtils.lambda$waitUntilMinKeyValueRecordsReceived$4(IntegrationTestUtils.java:710) at org.apache.kafka.streams.integration.utils.IntegrationTestUtils$$Lambda$3338/0x00000070018f4fb0.call(Unknown Source) at org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:483) at org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:451) at org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinKeyValueRecordsReceived(IntegrationTestUtils.java:708) at org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinKeyValueRecordsReceived(IntegrationTestUtils.java:681) at org.apache.kafka.streams.integration.FineGrainedAutoResetIntegrationTest.shouldResetByDuration(FineGrainedAutoResetIntegrationTest.java:334) -- This message was sent by Atlassian Jira (v8.20.10#820010)