zhangtongr created KAFKA-19558:
----------------------------------

             Summary: kafka-consumer-groups.sh --describe --all-groups command 
times out on large clusters with many consumer groups
                 Key: KAFKA-19558
                 URL: https://issues.apache.org/jira/browse/KAFKA-19558
             Project: Kafka
          Issue Type: Bug
          Components: consumer
    Affects Versions: 2.7.1
            Reporter: zhangtongr


Description:
When running the following command in a Kafka cluster with a large number of 
consumer groups (over 380) and topics (over 500), the kafka-consumer-groups.sh 
--describe --all-groups operation consistently times out and fails to return 
results.

Command used:

./kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe 
--all-groups
Observed behavior:
The command fails with a TimeoutException, and no consumer group information is 
returned. The following stack trace is observed:

java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.TimeoutException: 
Call(callName=describeConsumerGroups, deadlineMs=1753170317381, tries=1, 
nextAllowedTryMs=1753170317482) timed out at 1753170317382 after 1 attempt(s)
    at 
org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
    at 
org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
    ...
Caused by: org.apache.kafka.common.errors.TimeoutException: 
Call(callName=describeConsumerGroups, deadlineMs=..., tries=1, ...) timed out
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting 
to send the call. Call: describeConsumerGroups
Expected behavior:
The command should be able to return the description of all consumer groups, or 
at least fail more gracefully. Ideally, there should be:

A way to paginate or batch the describe operation;

Or configuration options to increase internal timeout thresholds;

Or better recommendations for dealing with large clusters.

Additional context:

Manually describing individual consumer groups via --group performs as expected 
and returns data quickly.

The issue appears to scale linearly with the number of consumer groups.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to