[ https://issues.apache.org/jira/browse/KAFKA-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Manikumar resolved KAFKA-4061. ------------------------------ Resolution: Cannot Reproduce This is mostly due to the health of the consumer offset topic. replication factor of the "__consumer_offsets" topic should be greater than 1 for greater availability. Please reopen if you think the issue still exists > Apache Kafka failover is not working > ------------------------------------ > > Key: KAFKA-4061 > URL: https://issues.apache.org/jira/browse/KAFKA-4061 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.10.0.0 > Environment: Linux > Reporter: Sebastian Bruckner > Priority: Major > > We have a 3 node cluster (kafka1 to kafka3) on 0.10.0.0 > When I shut down the node kafka1 i can see in the debug logs of my consumers > the following: > {code} > Sending coordinator request for group f49dc74f-3ccb-4fef-bafc-a7547fe26bc8 to > broker kafka3:9092 (id: 3 rack: null) > Received group coordinator response > ClientResponse(receivedTimeMs=1471511333843, disconnected=false, > request=ClientRequest(expectResponse=true, > callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@3892b449, > > request=RequestSend(header={api_key=10,api_version=0,correlation_id=118,client_id=f49dc74f-3ccb-4fef-bafc-a7547fe26bc8}, > body={group_id=f49dc74f-3ccb-4fef-bafc-a7547fe26bc8}), > createdTimeMs=1471511333794, sendTimeMs=1471511333794), > responseBody={error_code=0,coordinator={node_id=1,host=kafka1,port=9092}}) > {code} > So the problem is that kafka3 answers with an response telling the consumer > that the coordinator is kafka1 (which is shut down). > This then happens over and over again. > When i restart the consumer i can see the following: > {code} > Updated cluster metadata version 1 to Cluster(nodes = [kafka2:9092 (id: -2 > rack: null), kafka1:9092 (id: -1 rack: null), kafka3:9092 (id: -3 rack: > null)], partitions = []) > ... responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) > {code} > The difference is now that it answers with error code 15 > (GROUP_COORDINATOR_NOT_AVAILABLE). > Somehow kafka doesn't elect a new group coordinator. > In a local setup with 2 brokers and 1 zookeper it works fine.. > Can you help me debugging this? -- This message was sent by Atlassian JIRA (v7.6.3#76005)