GEORGE LI created KAFKA-8903: -------------------------------- Summary: allow the new replica (offset 0) to catch up with current leader using latest offset Key: KAFKA-8903 URL: https://issues.apache.org/jira/browse/KAFKA-8903 Project: Kafka Issue Type: Improvement Components: config, core Affects Versions: 2.3.0, 1.1.1, 1.1.0 Reporter: GEORGE LI Assignee: GEORGE LI
It very common (and sometimes frequent) that a broker has hardware failures (disk, memory, cpu, nic) for large Kafka deployment with thousands of brokers. The failed host will be replaced by a new one with the same "broker.id", and the new broker starts up as empty. All topic/partitions will start with offset 0. If the current leader has start offset > 0, this replaced broker will start the partition from the leader's earliest (start) offset. If the number of partitions and size of the partitions that this broker is hosting is high, it would take quite sometime for the ReplicaFetcher threads to pull from all the leaders in the cluster. and it could incur load of the brokers/leaders in the cluster affecting Latency, etc. performance. Once this replaced broker is caught up, Preferred leader election can be run to move the leaders back to this broker. To avoid above performance impact and make the failed broker replacement process much easier and scalable, we are proposing a new Dynamic config {{ replica.start.offset.strategy}}. The default is Earliest, and can be dynamically set for a broker (or brokers) to Latest. If it's set to Latest, when the empty broker is starting up, all partitions will be starting from latest (LEO LogEndOffset) of the current leader. So the replace broker replicas are in ISR and have 0 TotalLag/MaxLag, 0 URP almost instantly. For preferred leadership election, we can wait till the retention time has passed, and this replaced broker is in the replication for enough time. The better/safer approach is enable Preferred Leader Blacklist mentioned in KAFKA-8638 / KIP-491 , so before this replaced broker is completely caught up, it's leadership determination priority is moved to the lowest. -- This message was sent by Atlassian Jira (v8.3.2#803003)