[ https://issues.apache.org/jira/browse/KAFKA-16297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Igor Soarez reassigned KAFKA-16297: ----------------------------------- Assignee: Igor Soarez > Race condition while promoting future replica can lead to partition > unavailability. > ----------------------------------------------------------------------------------- > > Key: KAFKA-16297 > URL: https://issues.apache.org/jira/browse/KAFKA-16297 > Project: Kafka > Issue Type: Sub-task > Reporter: Igor Soarez > Assignee: Igor Soarez > Priority: Major > > KIP-858 proposed that when a directory failure occurs after changing the > assignment of a replica that's moved between two directories in the same > broker, but before the future replica promotion completes, the broker should > reassign the replica to inform the controller of its correct status. But this > hasn't yet been implemented, and without it this failure may lead to > indefinite partition unavailability. > Example scenario: > # A broker which leads partition P receives a request to alter the replica > from directory A to directory B. > # The broker creates a future replica in directory B and starts a replica > fetcher. > # Once the future replica first catches up, the broker queues a reassignment > to inform the controller of the directory change. > # The next time the replica catches up, the broker briefly blocks appends > and promotes the replica. However, before the promotion is attempted, > directory A fails. > # The controller was informed that P in now in directory B before it > received the notification that directory A has failed, so it does not elect a > new leader, and as long as the broker is online, partition A remains > unavailable. > As per KIP-858, the broker should detect this scenario and queue a > reassignment of P into directory ID {{{}DirectoryId.LOST{}}}. > -- This message was sent by Atlassian Jira (v8.20.10#820010)