GEORGE LI created KAFKA-8903:
--------------------------------
Summary: allow the new replica (offset 0) to catch up with current
leader using latest offset
Key: KAFKA-8903
URL: https://issues.apache.org/jira/browse/KAFKA-8903
Project: Kafka
Issue Type: Improvement
Components: config, core
Affects Versions: 2.3.0, 1.1.1, 1.1.0
Reporter: GEORGE LI
Assignee: GEORGE LI
It very common (and sometimes frequent) that a broker has hardware failures
(disk, memory, cpu, nic) for large Kafka deployment with thousands of brokers.
The failed host will be replaced by a new one with the same "broker.id", and
the new broker starts up as empty. All topic/partitions will start with offset
0. If the current leader has start offset > 0, this replaced broker will
start the partition from the leader's earliest (start) offset.
If the number of partitions and size of the partitions that this broker is
hosting is high, it would take quite sometime for the ReplicaFetcher threads to
pull from all the leaders in the cluster. and it could incur load of the
brokers/leaders in the cluster affecting Latency, etc. performance. Once
this replaced broker is caught up, Preferred leader election can be run to
move the leaders back to this broker.
To avoid above performance impact and make the failed broker replacement
process much easier and scalable, we are proposing a new Dynamic config {{
replica.start.offset.strategy}}. The default is Earliest, and can be
dynamically set for a broker (or brokers) to Latest. If it's set to Latest,
when the empty broker is starting up, all partitions will be starting from
latest (LEO LogEndOffset) of the current leader. So the replace broker
replicas are in ISR and have 0 TotalLag/MaxLag, 0 URP almost instantly.
For preferred leadership election, we can wait till the retention time has
passed, and this replaced broker is in the replication for enough time. The
better/safer approach is enable Preferred Leader Blacklist mentioned in
KAFKA-8638 / KIP-491 , so before this replaced broker is completely caught
up, it's leadership determination priority is moved to the lowest.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)