[
https://issues.apache.org/jira/browse/KAFKA-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Boyang Chen reassigned KAFKA-10655:
-----------------------------------
Assignee: Boyang Chen (was: Jason Gustafson)
> Raft leader should resign after write failures
> ----------------------------------------------
>
> Key: KAFKA-10655
> URL: https://issues.apache.org/jira/browse/KAFKA-10655
> Project: Kafka
> Issue Type: Sub-task
> Reporter: Jason Gustafson
> Assignee: Boyang Chen
> Priority: Major
>
> The controller's state machine relies on strong ordering guarantees. Each
> write assumes that all previous writes are either committed or will
> eventually become committed. In order to protect this assumption, the
> controller must not accept additional writes in the same epoch if a preceding
> write has failed. Instead, it should resign so that another leader can be
> elected. There are basically three classes of failures that we consider:
> 1. Serialization/state errors. Any unexpected write errors should be treated
> as fatal. The leader should gracefully resign and the process should shutdown.
> 2. Disk IO errors. Similarly, the leader should resign (gracefully if
> possible) and the process should shutdown.
> 3. Commit failures. If the leader is unable to commit data after some time,
> then it should gracefully resign, but the process should not exit.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)