Jason Gustafson created KAFKA-10655:
---------------------------------------
Summary: Raft leader should resign after write failures
Key: KAFKA-10655
URL: https://issues.apache.org/jira/browse/KAFKA-10655
Project: Kafka
Issue Type: Sub-task
Reporter: Jason Gustafson
The controller's state machine relies on strong ordering guarantees. Each write
assumes that all previous writes are either committed or will eventually become
committed. In order to protect this assumption, the controller must not accept
additional writes in the same epoch if a preceding write has failed. Instead,
it should resign so that another controller can be elected. There are basically
three classes of failures that we consider:
1. Serialization/state errors. Anything unexpected write errors should be
treated as fatal. The leader should gracefully resign and the process should
shutdown.
2. Disk IO errors. Similarly, the leader should resign (gracefully if possible)
and the process should shutdown.
3. Commit failures. If the leader is unable to commit data after some time,
then it should gracefully resign, but the process should not exit.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)