[jira] [Comment Edited] (KAFKA-15649) Handle directory failure timeout

Viktor Somogyi-Vass (Jira) Thu, 11 Apr 2024 02:39:12 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-15649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17836080#comment-17836080
 ]


Viktor Somogyi-Vass edited comment on KAFKA-15649 at 4/11/24 9:37 AM:
----------------------------------------------------------------------

[~soarez] I've uploaded a PR to start the review process early. I still would 
like to do some manual testing too but would be happy to receive some feedback 
once you get some time for this.


was (Author: viktorsomogyi):
[~soarez] I've uploaded a PR to start the review process early. I still would 
like to do some manual testing too but would be happy to receive some feedback.

> Handle directory failure timeout 
> ---------------------------------
>
>                 Key: KAFKA-15649
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15649
>             Project: Kafka
>          Issue Type: Sub-task
>            Reporter: Igor Soarez
>            Assignee: Viktor Somogyi-Vass
>            Priority: Minor
>
> If a broker with an offline log directory continues to fail to notify the 
> controller of either:
>  * the fact that the directory is offline; or
>  * of any replica assignment into a failed directory
> then the controller will not check if a leadership change is required, and 
> this may lead to partitions remaining indefinitely offline.
> KIP-858 proposes that the broker should shut down after a configurable 
> timeout to force a leadership change. Alternatively, the broker could also 
> request to be fenced, as long as there's a path for it to later become 
> unfenced.
> While this unavailability is possible in theory, in practice it's not easy to 
> entertain a scenario where a broker continues to appear as healthy before the 
> controller, but fails to send this information. So it's not clear if this is 
> a real problem. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (KAFKA-15649) Handle directory failure timeout

Reply via email to