[ 
https://issues.apache.org/jira/browse/KAFKA-16297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Soarez updated KAFKA-16297:
--------------------------------
    Component/s: jbod

> Race condition while promoting future replica can lead to partition 
> unavailability.
> -----------------------------------------------------------------------------------
>
>                 Key: KAFKA-16297
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16297
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: jbod
>            Reporter: Igor Soarez
>            Assignee: Igor Soarez
>            Priority: Major
>             Fix For: 3.7.1
>
>
> KIP-858 proposed that when a directory failure occurs after changing the 
> assignment of a replica that's moved between two directories in the same 
> broker, but before the future replica promotion completes, the broker should 
> reassign the replica to inform the controller of its correct status. But this 
> hasn't yet been implemented, and without it this failure may lead to 
> indefinite partition unavailability.
> Example scenario:
>  # A broker which leads partition P receives a request to alter the replica 
> from directory A to directory B.
>  # The broker creates a future replica in directory B and starts a replica 
> fetcher.
>  # Once the future replica first catches up, the broker queues a reassignment 
> to inform the controller of the directory change.
>  # The next time the replica catches up, the broker briefly blocks appends 
> and promotes the replica. However, before the promotion is attempted, 
> directory A fails.
>  # The controller was informed that P in now in directory B before it 
> received the notification that directory A has failed, so it does not elect a 
> new leader, and as long as the broker is online, partition A remains 
> unavailable.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to