[ https://issues.apache.org/jira/browse/KAFKA-16297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Igor Soarez updated KAFKA-16297: -------------------------------- Description: KIP-858 proposed that when a directory failure occurs after changing the assignment of a replica that's moved between two directories in the same broker, but before the future replica promotion completes, the broker should reassign the replica to inform the controller of its correct status. But this hasn't yet been implemented, and without it this failure may lead to indefinite partition unavailability. Example scenario: # A broker which leads partition P receives a request to alter the replica from directory A to directory B. # The broker creates a future replica in directory B and starts a replica fetcher. # Once the future replica first catches up, the broker queues a reassignment to inform the controller of the directory change. # The next time the replica catches up, the broker briefly blocks appends and promotes the replica. However, before the promotion is attempted, directory A fails. # The controller was informed that P in now in directory B before it received the notification that directory A has failed, so it does not elect a new leader, and as long as the broker is online, partition A remains unavailable. was: KIP-858 proposed that when a directory failure occurs after changing the assignment of a replica that's moved between two directories in the same broker, but before the future replica promotion completes, the broker should reassign the replica to inform the controller of its correct status. But this hasn't yet been implemented, and without it this failure may lead to indefinite partition unavailability. Example scenario: # A broker which leads partition P receives a request to alter the replica from directory A to directory B. # The broker creates a future replica in directory B and starts a replica fetcher. # Once the future replica first catches up, the broker queues a reassignment to inform the controller of the directory change. # The next time the replica catches up, the broker briefly blocks appends and promotes the replica. However, before the promotion is attempted, directory A fails. # The controller was informed that P in now in directory B before it received the notification that directory A has failed, so it does not elect a new leader, and as long as the broker is online, partition A remains unavailable. As per KIP-858, the broker should detect this scenario and queue a reassignment of P into directory ID {{{}DirectoryId.LOST{}}}. > Race condition while promoting future replica can lead to partition > unavailability. > ----------------------------------------------------------------------------------- > > Key: KAFKA-16297 > URL: https://issues.apache.org/jira/browse/KAFKA-16297 > Project: Kafka > Issue Type: Sub-task > Reporter: Igor Soarez > Assignee: Igor Soarez > Priority: Major > > KIP-858 proposed that when a directory failure occurs after changing the > assignment of a replica that's moved between two directories in the same > broker, but before the future replica promotion completes, the broker should > reassign the replica to inform the controller of its correct status. But this > hasn't yet been implemented, and without it this failure may lead to > indefinite partition unavailability. > Example scenario: > # A broker which leads partition P receives a request to alter the replica > from directory A to directory B. > # The broker creates a future replica in directory B and starts a replica > fetcher. > # Once the future replica first catches up, the broker queues a reassignment > to inform the controller of the directory change. > # The next time the replica catches up, the broker briefly blocks appends > and promotes the replica. However, before the promotion is attempted, > directory A fails. > # The controller was informed that P in now in directory B before it > received the notification that directory A has failed, so it does not elect a > new leader, and as long as the broker is online, partition A remains > unavailable. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)