[
https://issues.apache.org/jira/browse/KAFKA-20567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18080240#comment-18080240
]
PoAn Yang commented on KAFKA-20567:
-----------------------------------
The idea is good. The stray function only use topic id and partition to check
whether a log is stray [0]. We can move related check before UnifiedLog#create
[1]. If no one work on this issue, I am happy to handle this.
[0]
https://github.com/apache/kafka/blob/62db165d2af99c489010688fcaa4addf4c398964/core/src/main/scala/kafka/server/metadata/BrokerMetadataPublisher.scala#L315-L328
[1]
https://github.com/apache/kafka/blob/62db165d2af99c489010688fcaa4addf4c398964/storage/src/main/java/org/apache/kafka/storage/internals/log/LogManager.java#L524-L546
> Skip loading UnifiedLog for stray replicas during broker startup
> ----------------------------------------------------------------
>
> Key: KAFKA-20567
> URL: https://issues.apache.org/jira/browse/KAFKA-20567
> Project: Kafka
> Issue Type: New Feature
> Reporter: Chia-Ping Tsai
> Assignee: Chia-Ping Tsai
> Priority: Major
>
> We encountered a case where a broker with a particularly heavy partition
> restarts very slowly. The main reason is that the partition is located on a
> slow/remote disk, making the scanning of all segment files highly expensive,
> even if it merely touches the file metadata.
> We removed the broker from the replica assignment, expecting the partition to
> be marked as a stray replica and skipped during startup. However, the current
> stray replica detection mechanism requires a fully initialized UnifiedLog
> instance for evaluation. This pushes us back into the exact same bottleneck
> It appears we need a lightweight stray replica detection mechanism that only
> requires the topic id and partition id instead of a whole UnifiedLog. This
> information is sufficient to check if the broker is still hosting the
> replica, allowing us to safely skip loading the UnifiedLog if it is a stray
--
This message was sent by Atlassian Jira
(v8.20.10#820010)