Klearchos Chaloulos created KAFKA-5564: ------------------------------------------
Summary: Fail to create topics with error 'While recording the replica LEO, the partition [topic2,0] hasn't been created' Key: KAFKA-5564 URL: https://issues.apache.org/jira/browse/KAFKA-5564 Project: Kafka Issue Type: Bug Affects Versions: 0.9.0.1 Reporter: Klearchos Chaloulos Hello, *Short version* we have seen sporadic occurrences of the following issue: Topics whose leader is a specific broker fail to be created properly, and it is impossible to produce to them or consume from them. The following logs appears in the broker that is the leader of the faulty topics: {noformat} [2017-07-05 05:22:15,564] WARN [Replica Manager on Broker 3]: While recording the replica LEO, the partition [topic2,0] hasn't been created. (kafka.server.ReplicaManager) {noformat} *Detailed version*: Our setup consists of three brokers with ids 1, 2, 3. Broker 2 is the controller. We create 7 topics called topic1, topic2, topic3, topic4, topic5, topic6, topic7. Sometimes (sporadically) some of the topics are faulty. In the particular example I describe here the faulty topics are topics are topic6, topic4, topic2, topic3. The faulty topics all have the same leader broker 3. If we do a kafka-topics.sh --describe on the topics we see that for topics that do not have broker 3 as leader, the in sync replicas report that broker 3 is not synced: {noformat} bin/kafka-topics.sh --describe --zookeeper zookeeper:2181/kafka Topic:topic6 PartitionCount:1 ReplicationFactor:3 Configs: Topic: topic6 Partition: 0 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic:topic5 PartitionCount:1 ReplicationFactor:3 Configs:retention.ms=300000 Topic: topic5 Partition: 0 Leader: 2 Replicas: 2,3,1 Isr: 2,1 Topic:topic7 PartitionCount:1 ReplicationFactor:3 Configs: Topic: topic7 Partition: 0 Leader: 1 Replicas: 1,3,2 Isr: 1,2 Topic:topic4 PartitionCount:1 ReplicationFactor:3 Configs: Topic: topic4 Partition: 0 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic:topic1 PartitionCount:1 ReplicationFactor:3 Configs: Topic: topic1 Partition: 0 Leader: 2 Replicas: 2,1,3 Isr: 2,1 Topic:topic2 PartitionCount:1 ReplicationFactor:3 Configs: Topic: topic2 Partition: 0 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic:topic3 PartitionCount:1 ReplicationFactor:3 Configs: Topic: topic3 Partition: 0 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 {noformat} While for the faulty topics it is reported that all replicas are in sync. Also, the topic directories under the log.dir folder were not created in the faulty broker 3. We see the following logs in broker 3, which is the leader of the faulty topics: {noformat} [2017-07-05 05:22:15,564] WARN [Replica Manager on Broker 3]: While recording the replica LEO, the partition [topic2,0] hasn't been created. (kafka.server.ReplicaManager) {noformat} The above log is logged continuously. and the following error logs in the other 2 brokers, the replicas: {noformat} ERROR [ReplicaFetcherThread-0-3], Error for partition [topic3,0] to broker 3:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition {noformat} Again the above log is logged continuously. The issue described above occurs immediately after the deployment of the kafka cluster. A restart of the faulty broker (3 in this case) fixes the problem and the faulty topics work normally. I have also attached the broker configuration we use. Do you have any idea what might cause this issue? Best regards, Klearchos -- This message was sent by Atlassian JIRA (v6.4.14#64029)