Oliver Deakin created KAFKA-4446:
------------------------------------
Summary: If consumer offset topic created with less replicas than
min.insync.replicas, consuming is not possible
Key: KAFKA-4446
URL: https://issues.apache.org/jira/browse/KAFKA-4446
Project: Kafka
Issue Type: Bug
Components: core
Affects Versions: 0.10.1.0
Environment: Ubuntu 16.04
Reporter: Oliver Deakin
This is a bit of an edge case but it has a high impact. I have seen an issue
multiple times while creating a new cluster of Kafka brokers and consuming
components in an automated deployment. Full details of the chain of events are
given below. I expect this could also occur if the first consume to a Kafka
cluster happens while some nodes are in a failure state.
It appears that while the consumer offsets topic could be created with a
replication factor of only 1 or 2 (if only 1 Kafka broker is alive when it's
created), the min.insync.replicas is still applied and if that's higher than
the replication factor it becomes impossible to consume any messages. It seems
that when a topic is created explicitly with a replication factor less than
min.insync.replicas, that rule should not be applied as it makes the topic
unusable.
Detailed scenario:
- Kafka is utilised as an event messaging pipeline around which a number of
components are deployed that produce and consume messages.
- Deployments of a new environment bring up all components, including a 3 node
Kafka cluster and some event-driven components at the same time.
- Our configuration sets min.insync.replicas=2.
- Kafka node 1 opens its listener port before the other two brokers come up
- one of the components subscribes to a topic and attempts to consume from a
pre-created topic for the first time, also before the other two Kafka brokers
come up
- Kafka node 1 creates the consumer offsets topic with replication factor 1,
as it is the only live broker. This is expected behaviour as per the
documentation for offsets.topic.replication.factor.
- Kafka node 1 fails with a repeating error message and never recovers when
attempting to send a consumer offset message to the topic as there is only 1
member of the ISR but min.insync.replicas is 2. The repeating error message is:
kafka2_1 | org.apache.kafka.common.errors.NotEnoughReplicasException: Number
of insync replicas for partition [__consumer_offsets,31] is [1], below required
minimum [2]
- No consumers can consume from this cluster any more.
(FYI 0.10.1.0 is still listed as unreleased in JIRA, but the project front page
says it's the latest release)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)