[ https://issues.apache.org/jira/browse/KAFKA-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ismael Juma updated KAFKA-4460: ------------------------------- Labels: reliability (was: ) > Consumer stops getting messages when partition leader dies > ---------------------------------------------------------- > > Key: KAFKA-4460 > URL: https://issues.apache.org/jira/browse/KAFKA-4460 > Project: Kafka > Issue Type: Bug > Components: consumer > Affects Versions: 0.10.0.1 > Reporter: Bernhard Bonigl > Labels: reliability > > I have a setup consisting of 2 Kafka broker (0 and 1) using a zookeeper, a > spring boot application with producers and a spring boot application with > consumers. > The topic has 5 partitions and a replication factor of 2, both brokers are in > sync, partitions have alternating leader (although it doesn't matter). > The spring boot kafka configuration is setup as follows: > {code} > kafka.address: localhost:9092,localhost:9093 > kafka.numberOfConsumers: 20 > {code} > Where Broker 0 uses port 9092 and Broker 1 uses port 9093. > ---- > When sending events they are consumed just fine. When Broker 0 is killed all > topics get Broker 1 as their leader, however the consumers stop consuming > events until Broker 0 is back. This happens nearly every time, but usually it > takes at most 3 attempts of alternatively killing the leading broker to > create the error state. > The console log is getting spammed by the coordinators, it looks like the > coordinator representing broker 0 is marked as dead, but instantly > rediscovered and used again many many times, and only at the end the other > broker is discovered. When the switch works the log is only minimally spammed > and the other broker is discovered very quickly. > This gist contains the log of the application when the problem occurs. The > first line is a log of ours indicating a successfully consumed message. After > that the Broker 0 (localhost:9092) is killed - you can see the log spam I was > talking about. At the end localhost:9093 is discovered, however no further > messages are consumed. After that I killed the application. > ---- > I also discovered this unresolved stackoverflow question, which seems to be > the same problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)