[ https://issues.apache.org/jira/browse/KAFKA-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419992#comment-15419992 ]
Grant Henke commented on KAFKA-3959: ------------------------------------ [~onurkaraman] I understand the need for a quick fix and that automatically maintaining a given replication factor will take some time to implement. I wasn't proposing all of that KIP work for this fix. I was thinking just changing the tracked metadata to contain the "target replication factor" and leveraging it to report under replicated partitions would provide an admin enough to diagnose and fix the problem quickly. I referenced the KIPs to show that the change also supports an ultimate fix down the road. I am worried about __consumer_offsets topic creation failing with GROUP_COORDINATOR_NOT_AVAILABLE until there is at least KafkaConfig.offsetsTopicReplicationFactor brokers because the default for KafkaConfig.offsetsTopicReplicationFactor is 3. That change means any cluster with < 3 brokers will need to change defaults before starting. Including all of the embedded clusters in our tests and likely many users and frameworks development clusters and tests as well. > __consumer_offsets wrong number of replicas at startup > ------------------------------------------------------ > > Key: KAFKA-3959 > URL: https://issues.apache.org/jira/browse/KAFKA-3959 > Project: Kafka > Issue Type: Bug > Components: consumer, offset manager, replication > Affects Versions: 0.9.0.1, 0.10.0.0 > Environment: Brokers of 3 kafka nodes running Red Hat Enterprise > Linux Server release 7.2 (Maipo) > Reporter: Alban Hurtaud > > When creating a stack of 3 kafka brokers, the consumer is starting faster > than kafka nodes and when trying to read a topic, only one kafka node is > available. > So the __consumer_offsets is created with a replication factor set to 1 > (instead of configured 3) : > offsets.topic.replication.factor=3 > default.replication.factor=3 > min.insync.replicas=2 > Then, other kafka nodes go up and we have exceptions because the replicas # > for __consumer_offsets is 1 and min insync is 2. So exceptions are thrown. > What I missed is : Why the __consumer_offsets is created with replication to > 1 (when 1 broker is running) whereas in server.properties it is set to 3 ? > To reproduce : > - Prepare 3 kafka nodes with the 3 lines above added to servers.properties. > - Run one kafka, > - Run one consumer (the __consumer_offsets is created with replicas =1) > - Run 2 more kafka nodes -- This message was sent by Atlassian JIRA (v6.3.4#6332)