[ 
https://issues.apache.org/jira/browse/KAFKA-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-10655:
------------------------------------
    Description: 
The controller's state machine relies on strong ordering guarantees. Each write 
assumes that all previous writes are either committed or will eventually become 
committed. In order to protect this assumption, the controller must not accept 
additional writes in the same epoch if a preceding write has failed. Instead, 
it should resign so that another leader can be elected. There are basically 
three classes of failures that we consider:

1. Serialization/state errors. Any unexpected write errors should be treated as 
fatal. The leader should gracefully resign and the process should shutdown.
2. Disk IO errors. Similarly, the leader should resign (gracefully if possible) 
and the process should shutdown. 
3. Commit failures. If the leader is unable to commit data after some time, 
then it should gracefully resign, but the process should not exit.



  was:
The controller's state machine relies on strong ordering guarantees. Each write 
assumes that all previous writes are either committed or will eventually become 
committed. In order to protect this assumption, the controller must not accept 
additional writes in the same epoch if a preceding write has failed. Instead, 
it should resign so that another controller can be elected. There are basically 
three classes of failures that we consider:

1. Serialization/state errors. Anything unexpected write errors should be 
treated as fatal. The leader should gracefully resign and the process should 
shutdown.
2. Disk IO errors. Similarly, the leader should resign (gracefully if possible) 
and the process should shutdown. 
3. Commit failures. If the leader is unable to commit data after some time, 
then it should gracefully resign, but the process should not exit.




> Raft leader should resign after write failures
> ----------------------------------------------
>
>                 Key: KAFKA-10655
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10655
>             Project: Kafka
>          Issue Type: Sub-task
>            Reporter: Jason Gustafson
>            Assignee: Jason Gustafson
>            Priority: Major
>
> The controller's state machine relies on strong ordering guarantees. Each 
> write assumes that all previous writes are either committed or will 
> eventually become committed. In order to protect this assumption, the 
> controller must not accept additional writes in the same epoch if a preceding 
> write has failed. Instead, it should resign so that another leader can be 
> elected. There are basically three classes of failures that we consider:
> 1. Serialization/state errors. Any unexpected write errors should be treated 
> as fatal. The leader should gracefully resign and the process should shutdown.
> 2. Disk IO errors. Similarly, the leader should resign (gracefully if 
> possible) and the process should shutdown. 
> 3. Commit failures. If the leader is unable to commit data after some time, 
> then it should gracefully resign, but the process should not exit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to