[ https://issues.apache.org/jira/browse/KAFKA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032285#comment-17032285 ]
Evan Williams edited comment on KAFKA-4084 at 2/7/20 10:44 AM: --------------------------------------------------------------- [~sql_consulting] We are using min.insync.replicas=1. And have replication.factor=3 or above for most topics (6 brokers). As a side note, one interesting thing I've seen reported now from the owners of the clients (streams) is that, for certain topics/partitions - they had no leader, even if there was a clean shutdown of the bootstrapping broker. So something is quite weird there. What might cause that ? Topic: data.vehicle-topic.journey-dpi PartitionCount: 6 ReplicationFactor: 3 Configs: cleanup.policy=delete MarkedForDeletion: true Topic: topic.name: 0 Leader: none Replicas: 54,52,53 Isr: 54 MarkedForDeletion: true Topic: topic.name: 1 Leader: none Replicas: 82,53,54 Isr: 82 MarkedForDeletion: true Topic: topic.name: 2 Leader: none Replicas: 83,54,82 Isr: 83 MarkedForDeletion: true Topic: topic.name: 3 Leader: none Replicas: 84,82,83 Isr: 83 MarkedForDeletion: true Topic: topic.name: 4 Leader: none Replicas: 52,83,84 Isr: 83 MarkedForDeletion: true Topic: topic.name: 5 Leader: none Replicas: 53,84,52 Isr: 53 MarkedForDeletion: true But yes, I think there is a clear case for KIP-491 in this scenario, of URP to just blacklist a broker from becoming leader until x factor is satisfied, or it's manually removed. was (Author: blodsbror): [~sql_consulting] We are using min.insync.replicas=1. And have replication.factor=3 or above for most topics (6 brokers). As a side note, one interesting thing I've seen reported now from the owners of the clients (streams) is that, for certain topics/partitions - they had no leader, even if there was a clean shutdown of the bootstrapping broker. So something is quite weird there.. But yes, I think there is a clear case for KIP-491 in this scenario, of URP to just blacklist a broker from becoming leader until x factor is satisfied, or it's manually removed. > automated leader rebalance causes replication downtime for clusters with too > many partitions > -------------------------------------------------------------------------------------------- > > Key: KAFKA-4084 > URL: https://issues.apache.org/jira/browse/KAFKA-4084 > Project: Kafka > Issue Type: Bug > Components: controller > Affects Versions: 0.8.2.2, 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1 > Reporter: Tom Crayford > Priority: Major > Labels: reliability > Fix For: 1.1.0 > > > If you enable {{auto.leader.rebalance.enable}} (which is on by default), and > you have a cluster with many partitions, there is a severe amount of > replication downtime following a restart. This causes > `UnderReplicatedPartitions` to fire, and replication is paused. > This is because the current automated leader rebalance mechanism changes > leaders for *all* imbalanced partitions at once, instead of doing it > gradually. This effectively stops all replica fetchers in the cluster > (assuming there are enough imbalanced partitions), and restarts them. This > can take minutes on busy clusters, during which no replication is happening > and user data is at risk. Clients with {{acks=-1}} also see issues at this > time, because replication is effectively stalled. > To quote Todd Palino from the mailing list: > bq. There is an admin CLI command to trigger the preferred replica election > manually. There is also a broker configuration “auto.leader.rebalance.enable” > which you can set to have the broker automatically perform the PLE when > needed. DO NOT USE THIS OPTION. There are serious performance issues when > doing so, especially on larger clusters. It needs some development work that > has not been fully identified yet. > This setting is extremely useful for smaller clusters, but with high > partition counts causes the huge issues stated above. > One potential fix could be adding a new configuration for the number of > partitions to do automated leader rebalancing for at once, and *stop* once > that number of leader rebalances are in flight, until they're done. There may > be better mechanisms, and I'd love to hear if anybody has any ideas. -- This message was sent by Atlassian Jira (v8.3.4#803005)