[ 
https://issues.apache.org/jira/browse/KAFKA-9733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Yu updated KAFKA-9733:
------------------------------
    Description: 
Note: Description still not finished. Still not sure if this is needed.

Kafka's current replication model (with its single leader and several 
followers) is somewhat similar to the current consensus algorithms being used 
in databases (RAFT) with the major difference being the existence of the ISR. 
Consequently, Kafka suffers from the same fault tolerance issues as does other 
distributed systems which rely on RAFT: the leader tends to be the chokepoint 
for failures i.e. if it goes down, it will have a brief stop-the-world effect. 

In contrast, giving all replicas the power to write and read to other replicas 
is also difficult to accomplish (as emphasized by the complexity of the 
Egalitarian Paxos algorithm), since consistency is so hard to maintain in such 
an algorithm, plus very little gain compared to the overhead. 

Therefore, I propose that we have an intermediate step in between these two 
algorithms, and that is the leader partition quorum.

 

  was:
Note: Description still not finished. Still not sure if this is needed.

This feature I'm proposing might not offer too much of a performance boost, but 
I think it is still worth considering. In our current replication model, we 
have a single leader and several followers (with our ISR included). However, 
the current bottleneck would be that once the leader goes down, it will take a 
while to get the next leader online, which is a serious pain. (also leading to 
a considerable write/read delay)

In order to help alleviate this issue, we can consider multiple clusters 
independent of each other i.e. each of them are their own leader/follower group 
for the _same partition set_. The difference here is that these clusters can 
_communicate_ between one another. 

At first, this might seem redundant, but there is a reasoning to this:
 # Let's say we have two leader/follower groups (I must note that these two 
groups does _not_ have shared memory) for the same replicated partition.
 # One leader goes down, and that means for the respective followers, they 
would under normal circumstances be unable to receive new write updates.
 # However, in this situation, we can have those followers poll their 
write/read requests from the other group whose leader has _not gone down._ It 
doesn't necessarily have to be  the leader either, it can be other members from 
that group's ISR. 
 # The idea here is that if the members of these two groups detect that they 
are lagging behind another, they would be able to poll one another for updates.

So what is the difference here from just having multiple leaders in a single 
cluster?

The answer is that the leader is responsible for making sure that there is 
consistency within _its own cluster._ Not the other cluster it is in 
communication with.  


> Consider addition of leader partition quorum
> --------------------------------------------
>
>                 Key: KAFKA-9733
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9733
>             Project: Kafka
>          Issue Type: New Feature
>          Components: clients, core
>            Reporter: Richard Yu
>            Priority: Minor
>
> Note: Description still not finished. Still not sure if this is needed.
> Kafka's current replication model (with its single leader and several 
> followers) is somewhat similar to the current consensus algorithms being used 
> in databases (RAFT) with the major difference being the existence of the ISR. 
> Consequently, Kafka suffers from the same fault tolerance issues as does 
> other distributed systems which rely on RAFT: the leader tends to be the 
> chokepoint for failures i.e. if it goes down, it will have a brief 
> stop-the-world effect. 
> In contrast, giving all replicas the power to write and read to other 
> replicas is also difficult to accomplish (as emphasized by the complexity of 
> the Egalitarian Paxos algorithm), since consistency is so hard to maintain in 
> such an algorithm, plus very little gain compared to the overhead. 
> Therefore, I propose that we have an intermediate step in between these two 
> algorithms, and that is the leader partition quorum.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to