[ 
https://issues.apache.org/jira/browse/KAFKA-9733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Yu updated KAFKA-9733:
------------------------------
    Description: 
Note: Description still not finished.

This feature I'm proposing might not offer too much of a performance boost, but 
I think it is still worth considering. In our current replication model, we 
have a single leader and several followers (with our ISR included). However, 
the current bottleneck would be that once the leader goes down, it will take a 
while to get the next leader online, which is a serious pain. (also leading to 
a considerable write/read delay)

In order to help alleviate this issue, we can consider multiple clusters 
independent of each other i.e. each of them are their own leader/follower group 
for the _same partition set_. The difference here is that these clusters can 
_communicate_ between one another. 

At first, this might seem redundant, but there is a reasoning to this:
 # Let's say we have two leader/follower groups (I must note that these two 
groups does _not_ have shared memory) for the same replicated partition.
 # One leader goes down, and that means for the respective followers, they 
would under normal circumstances be unable to receive new write updates.
 # However, in this situation, we can have those followers poll their 
write/read requests from the other group whose leader has _not gone down._ It 
doesn't necessarily have to be  the leader either, it can be other members from 
that group's ISR. 
 # The idea here is that if the members of these two groups detect that they 
are lagging behind another, they would be able to poll one another for updates.

So what is the difference here from just having multiple leaders in a single 
cluster?

The answer is that the leader is responsible for making sure that there is 
consistency within _its own cluster._ Not the other cluster it is in 
communication with.  

  was:
Note: Description still not finished.

This feature I'm proposing might not offer too much of a performance boost, but 
I think it is still worth considering. In our current replication model, we 
have a single leader and several followers (with our ISR included). However, 
the current bottleneck would be that once the leader goes down, it will take a 
while to get the next leader online, which is a serious pain. (also leading to 
a considerable write/read delay)

In order to help alleviate this issue, we can consider multiple clusters 
independent of each other i.e. each of them are their own leader/follower group 
for the _same partition set_. The difference here is that these clusters can 
_communicate_ between one another. 

At first, this might seem redundant, but there is a reasoning to this:
 # Let's say we have two leader/follower groups for the same replicated 
partition.
 # One leader goes down, and that means for the respective followers, they 
would under normal circumstances be unable to receive new write updates.
 # However, in this situation, we can have those followers poll their 
write/read requests from the other group whose leader has _not gone down._ It 
doesn't necessarily have to be  the leader either, it can be other members from 
that group's ISR. 
 # The idea here is that if the members of these two groups detect that they 
are lagging behind another, they would be able to poll one another for updates.

So what is the difference here from just having multiple leaders in a single 
cluster?

The answer is that the leader is responsible for making sure that there is 
consistency within _its own cluster._ Not the other cluster it is in 
communication with.  


> Consider addition to Kafka's replication model
> ----------------------------------------------
>
>                 Key: KAFKA-9733
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9733
>             Project: Kafka
>          Issue Type: New Feature
>          Components: clients, core
>            Reporter: Richard Yu
>            Priority: Minor
>
> Note: Description still not finished.
> This feature I'm proposing might not offer too much of a performance boost, 
> but I think it is still worth considering. In our current replication model, 
> we have a single leader and several followers (with our ISR included). 
> However, the current bottleneck would be that once the leader goes down, it 
> will take a while to get the next leader online, which is a serious pain. 
> (also leading to a considerable write/read delay)
> In order to help alleviate this issue, we can consider multiple clusters 
> independent of each other i.e. each of them are their own leader/follower 
> group for the _same partition set_. The difference here is that these 
> clusters can _communicate_ between one another. 
> At first, this might seem redundant, but there is a reasoning to this:
>  # Let's say we have two leader/follower groups (I must note that these two 
> groups does _not_ have shared memory) for the same replicated partition.
>  # One leader goes down, and that means for the respective followers, they 
> would under normal circumstances be unable to receive new write updates.
>  # However, in this situation, we can have those followers poll their 
> write/read requests from the other group whose leader has _not gone down._ It 
> doesn't necessarily have to be  the leader either, it can be other members 
> from that group's ISR. 
>  # The idea here is that if the members of these two groups detect that they 
> are lagging behind another, they would be able to poll one another for 
> updates.
> So what is the difference here from just having multiple leaders in a single 
> cluster?
> The answer is that the leader is responsible for making sure that there is 
> consistency within _its own cluster._ Not the other cluster it is in 
> communication with.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to