[ 
https://issues.apache.org/jira/browse/KAFKA-20567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18080240#comment-18080240
 ] 

PoAn Yang commented on KAFKA-20567:
-----------------------------------

The idea is good. The stray function only use topic id and partition to check 
whether a log is stray [0]. We can move related check before UnifiedLog#create 
[1]. If no one work on this issue, I am happy to handle this.

 

[0] 
https://github.com/apache/kafka/blob/62db165d2af99c489010688fcaa4addf4c398964/core/src/main/scala/kafka/server/metadata/BrokerMetadataPublisher.scala#L315-L328

[1] 
https://github.com/apache/kafka/blob/62db165d2af99c489010688fcaa4addf4c398964/storage/src/main/java/org/apache/kafka/storage/internals/log/LogManager.java#L524-L546

> Skip loading UnifiedLog for stray replicas during broker startup
> ----------------------------------------------------------------
>
>                 Key: KAFKA-20567
>                 URL: https://issues.apache.org/jira/browse/KAFKA-20567
>             Project: Kafka
>          Issue Type: New Feature
>            Reporter: Chia-Ping Tsai
>            Assignee: Chia-Ping Tsai
>            Priority: Major
>
> We encountered a case where a broker with a particularly heavy partition 
> restarts very slowly. The main reason is that the partition is located on a 
> slow/remote disk, making the scanning of all segment files highly expensive, 
> even if it merely touches the file metadata.
> We removed the broker from the replica assignment, expecting the partition to 
> be marked as a stray replica and skipped during startup. However, the current 
> stray replica detection mechanism requires a fully initialized UnifiedLog 
> instance for evaluation. This pushes us back into the exact same bottleneck
> It appears we need a lightweight stray replica detection mechanism that only 
> requires the topic id and partition id instead of a whole UnifiedLog. This 
> information is sufficient to check if the broker is still hosting the 
> replica, allowing us to safely skip loading the UnifiedLog if it is a stray



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to