[
https://issues.apache.org/jira/browse/KAFKA-15649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17836080#comment-17836080
]
Viktor Somogyi-Vass edited comment on KAFKA-15649 at 4/11/24 9:37 AM:
----------------------------------------------------------------------
[~soarez] I've uploaded a PR to start the review process early. I still would
like to do some manual testing too but would be happy to receive some feedback
once you get some time for this.
was (Author: viktorsomogyi):
[~soarez] I've uploaded a PR to start the review process early. I still would
like to do some manual testing too but would be happy to receive some feedback.
> Handle directory failure timeout
> ---------------------------------
>
> Key: KAFKA-15649
> URL: https://issues.apache.org/jira/browse/KAFKA-15649
> Project: Kafka
> Issue Type: Sub-task
> Reporter: Igor Soarez
> Assignee: Viktor Somogyi-Vass
> Priority: Minor
>
> If a broker with an offline log directory continues to fail to notify the
> controller of either:
> * the fact that the directory is offline; or
> * of any replica assignment into a failed directory
> then the controller will not check if a leadership change is required, and
> this may lead to partitions remaining indefinitely offline.
> KIP-858 proposes that the broker should shut down after a configurable
> timeout to force a leadership change. Alternatively, the broker could also
> request to be fenced, as long as there's a path for it to later become
> unfenced.
> While this unavailability is possible in theory, in practice it's not easy to
> entertain a scenario where a broker continues to appear as healthy before the
> controller, but fails to send this information. So it's not clear if this is
> a real problem.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)