[ https://issues.apache.org/jira/browse/KAFKA-8516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860591#comment-16860591 ]
Richard Yu commented on KAFKA-8516: ----------------------------------- Well, this is when we start straying into an area called "consensus algorithms". Kafka's current leader-replica model closely follows an algorithm referred to as Raft (research paper here: [https://raft.github.io/raft.pdf] ). If we wish to implement the write permissions part (which looks like a pretty big if), then we would perhaps have to consider something along the lines of EPaxos ( [https://www.cs.cmu.edu/~dga/papers/epaxos-sosp2013.pdf] ). cc [~hachikuji] your thoughts on this? > Consider allowing all replicas to have read/write permissions > ------------------------------------------------------------- > > Key: KAFKA-8516 > URL: https://issues.apache.org/jira/browse/KAFKA-8516 > Project: Kafka > Issue Type: Improvement > Reporter: Richard Yu > Priority: Major > > Currently, in Kafka internals, a leader is responsible for all the read and > write operations requested by the user. This naturally incurs a bottleneck > since one replica, as the leader, would experience a significantly heavier > workload than other replicas and also means that all client commands must > pass through a chokepoint. If a leader fails, all processing effectively > comes to a halt until another leader election. In order to help solve this > problem, we could think about redesigning Kafka core so that any replica is > able to do read and write operations as well. That is, the system be changed > so that _all_ replicas have read/write permissions. > > This has multiple positives. Notably the following: > * Workload can be more evenly distributed since leader replicas are weighted > more than follower replicas (in this new design, all replicas are equal) > * Some failures would not be as catastrophic as in the leader-follower > paradigm. There is no one single "leader". If one replica goes down, others > are still able to read/write as needed. Processing could continue without > interruption. > The implementation for such a change like this will be very extensive and > discussion would be needed to decide if such an improvement as described > above would warrant such a drastic redesign of Kafka internals. > Relevant KIP for read permissions can be found here: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica] -- This message was sent by Atlassian JIRA (v7.6.3#76005)