[
https://issues.apache.org/jira/browse/KAFKA-20567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18080253#comment-18080253
]
Chia-Ping Tsai commented on KAFKA-20567:
----------------------------------------
> We can move related check before UnifiedLog#create [1]
It would be better to keep the original stray mechanism. This JIRA should
introduce an additional 'loose' mechanism instead.
> Skip loading UnifiedLog for stray replicas during broker startup
> ----------------------------------------------------------------
>
> Key: KAFKA-20567
> URL: https://issues.apache.org/jira/browse/KAFKA-20567
> Project: Kafka
> Issue Type: New Feature
> Reporter: Chia-Ping Tsai
> Assignee: Chia-Ping Tsai
> Priority: Major
>
> We encountered a case where a broker with a particularly heavy partition
> restarts very slowly. The main reason is that the partition is located on a
> slow/remote disk, making the scanning of all segment files highly expensive,
> even if it merely touches the file metadata.
> We removed the broker from the replica assignment, expecting the partition to
> be marked as a stray replica and skipped during startup. However, the current
> stray replica detection mechanism requires a fully initialized UnifiedLog
> instance for evaluation. This pushes us back into the exact same bottleneck
> It appears we need a lightweight stray replica detection mechanism that only
> requires the topic id and partition id instead of a whole UnifiedLog. This
> information is sufficient to check if the broker is still hosting the
> replica, allowing us to safely skip loading the UnifiedLog if it is a stray
--
This message was sent by Atlassian Jira
(v8.20.10#820010)