Sebastian Bruckner created KAFKA-4061:
-----------------------------------------
Summary: Apache Kafka failover is not working
Key: KAFKA-4061
URL: https://issues.apache.org/jira/browse/KAFKA-4061
Project: Kafka
Issue Type: Bug
Affects Versions: 0.10.0.0
Environment: Linux
Reporter: Sebastian Bruckner
We have a 3 node cluster (kafka1 to kafka3) on 0.10.0.0
When I shut down the node kafka1 i can see in the debug logs of my consumers
the following:
{code}
Sending coordinator request for group f49dc74f-3ccb-4fef-bafc-a7547fe26bc8 to
broker kafka3:9092 (id: 3 rack: null)
Received group coordinator response
ClientResponse(receivedTimeMs=1471511333843, disconnected=false,
request=ClientRequest(expectResponse=true,
callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@3892b449,
request=RequestSend(header={api_key=10,api_version=0,correlation_id=118,client_id=f49dc74f-3ccb-4fef-bafc-a7547fe26bc8},
body={group_id=f49dc74f-3ccb-4fef-bafc-a7547fe26bc8}),
createdTimeMs=1471511333794, sendTimeMs=1471511333794),
responseBody={error_code=0,coordinator={node_id=1,host=kafka1,port=9092}})
{code}
So the problem is that kafka3 answers with an response telling the consumer
that the coordinator is kafka1 (which is shut down).
This then happens over and over again.
When i restart the consumer i can see the following:
{code}
Updated cluster metadata version 1 to Cluster(nodes = [kafka2:9092 (id: -2
rack: null), kafka1:9092 (id: -1 rack: null), kafka3:9092 (id: -3 rack: null)],
partitions = [])
... responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}})
{code}
The difference is now that it answers with error code 15
(GROUP_COORDINATOR_NOT_AVAILABLE).
Somehow kafka doesn't elect a new group coordinator.
In a local setup with 2 brokers and 1 zookeper it works fine..
Can you help me debugging this?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)