[jira] [Assigned] (KAFKA-10655) Raft leader should resign after write failures

Boyang Chen (Jira) Tue, 17 Nov 2020 10:29:07 -0800


     [ 
https://issues.apache.org/jira/browse/KAFKA-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Boyang Chen reassigned KAFKA-10655:
-----------------------------------

    Assignee: Boyang Chen  (was: Jason Gustafson)

> Raft leader should resign after write failures
> ----------------------------------------------
>
>                 Key: KAFKA-10655
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10655
>             Project: Kafka
>          Issue Type: Sub-task
>            Reporter: Jason Gustafson
>            Assignee: Boyang Chen
>            Priority: Major
>
> The controller's state machine relies on strong ordering guarantees. Each 
> write assumes that all previous writes are either committed or will 
> eventually become committed. In order to protect this assumption, the 
> controller must not accept additional writes in the same epoch if a preceding 
> write has failed. Instead, it should resign so that another leader can be 
> elected. There are basically three classes of failures that we consider:
> 1. Serialization/state errors. Any unexpected write errors should be treated 
> as fatal. The leader should gracefully resign and the process should shutdown.
> 2. Disk IO errors. Similarly, the leader should resign (gracefully if 
> possible) and the process should shutdown. 
> 3. Commit failures. If the leader is unable to commit data after some time, 
> then it should gracefully resign, but the process should not exit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (KAFKA-10655) Raft leader should resign after write failures

Reply via email to