Chia-Ping Tsai created KAFKA-20567:
--------------------------------------
Summary: Skip loading UnifiedLog for stray replicas during broker
startup
Key: KAFKA-20567
URL: https://issues.apache.org/jira/browse/KAFKA-20567
Project: Kafka
Issue Type: New Feature
Reporter: Chia-Ping Tsai
Assignee: Chia-Ping Tsai
We encountered a case where a broker with a particularly heavy partition
restarts very slowly. The main reason is that the partition is located on a
slow/remote disk, making the scanning of all segment files highly expensive,
even if it merely touches the file metadata.
We removed the broker from the replica assignment, expecting the partition to
be marked as a stray replica and skipped during startup. However, the current
stray replica detection mechanism requires a fully initialized UnifiedLog
instance for evaluation. This pushes us back into the exact same bottleneck
It appears we need a lightweight stray replica detection mechanism that only
requires the topic id and partition id instead of a whole UnifiedLog. This
information is sufficient to check if the broker is still hosting the replica,
allowing us to safely skip loading the UnifiedLog if it is a stray
--
This message was sent by Atlassian Jira
(v8.20.10#820010)