[ 
https://issues.apache.org/jira/browse/KAFKA-20567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18080253#comment-18080253
 ] 

Chia-Ping Tsai commented on KAFKA-20567:
----------------------------------------

> We can move related check before UnifiedLog#create [1]

It would be better to keep the original stray mechanism. This JIRA should 
introduce an additional 'loose' mechanism instead.

> Skip loading UnifiedLog for stray replicas during broker startup
> ----------------------------------------------------------------
>
>                 Key: KAFKA-20567
>                 URL: https://issues.apache.org/jira/browse/KAFKA-20567
>             Project: Kafka
>          Issue Type: New Feature
>            Reporter: Chia-Ping Tsai
>            Assignee: Chia-Ping Tsai
>            Priority: Major
>
> We encountered a case where a broker with a particularly heavy partition 
> restarts very slowly. The main reason is that the partition is located on a 
> slow/remote disk, making the scanning of all segment files highly expensive, 
> even if it merely touches the file metadata.
> We removed the broker from the replica assignment, expecting the partition to 
> be marked as a stray replica and skipped during startup. However, the current 
> stray replica detection mechanism requires a fully initialized UnifiedLog 
> instance for evaluation. This pushes us back into the exact same bottleneck
> It appears we need a lightweight stray replica detection mechanism that only 
> requires the topic id and partition id instead of a whole UnifiedLog. This 
> information is sufficient to check if the broker is still hosting the 
> replica, allowing us to safely skip loading the UnifiedLog if it is a stray



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to