[
https://issues.apache.org/jira/browse/KAFKA-16234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gaurav Narula reassigned KAFKA-16234:
-------------------------------------
Assignee: Omnia Ibrahim
> Log directory failure re-creates partitions in another logdir automatically
> ---------------------------------------------------------------------------
>
> Key: KAFKA-16234
> URL: https://issues.apache.org/jira/browse/KAFKA-16234
> Project: Kafka
> Issue Type: Bug
> Components: jbod
> Affects Versions: 3.7.0
> Reporter: Gaurav Narula
> Assignee: Omnia Ibrahim
> Priority: Major
>
> With [KAFKA-16157|https://github.com/apache/kafka/pull/15263] we made changes
> in {{HostedPartition.Offline}} enum variant to embed {{Partition}} object.
> Further, {{ReplicaManager::getOrCreatePartition}} tries to compare the old
> and new topicIds to decide if it needs to create a new log.
> The getter for `Partition::topicId` relies on retrieving the topicId from
> {{log}} field or {{{{logManager.currentLogs}}. The former is set to {{None}}
> when a partition is marked offline and the key for the partition is removed
> from the latter by {{{{LogManager::handleLogDirFailure}}. Therefore, topicId
> for a partitioned marked offline always returns {{None}} and new logs for all
> partitions in a failed log directory are always created on another disk.
> The broker will fail to restart after the failed disk is repaired because
> same partitions will occur in two different directories. The error does
> however inform the operator to remove the partitions from the disk that
> failed which should help with broker startup.
> We can avoid this with
> [KAFKA-16212|https://issues.apache.org/jira/browse/KAFKA-16212] but in the
> short-term, an immediate solution can be to have {{Partition}} object accept
> {{Option[TopicId]}} in it's constructor and have it fallback to {{log}} or
> {{logManager}} if it's unset.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)