Chia-Ping Tsai created KAFKA-20567:
--------------------------------------

             Summary: Skip loading UnifiedLog for stray replicas during broker 
startup
                 Key: KAFKA-20567
                 URL: https://issues.apache.org/jira/browse/KAFKA-20567
             Project: Kafka
          Issue Type: New Feature
            Reporter: Chia-Ping Tsai
            Assignee: Chia-Ping Tsai


We encountered a case where a broker with a particularly heavy partition 
restarts very slowly. The main reason is that the partition is located on a 
slow/remote disk, making the scanning of all segment files highly expensive, 
even if it merely touches the file metadata.

We removed the broker from the replica assignment, expecting the partition to 
be marked as a stray replica and skipped during startup. However, the current 
stray replica detection mechanism requires a fully initialized UnifiedLog 
instance for evaluation. This pushes us back into the exact same bottleneck

It appears we need a lightweight stray replica detection mechanism that only 
requires the topic id and partition id instead of a whole UnifiedLog. This 
information is sufficient to check if the broker is still hosting the replica, 
allowing us to safely skip loading the UnifiedLog if it is a stray



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to